|
|
|
||||
|
|
|||||
Robust regression can be used in any situation in which you would use OLS regression. When doing the regression diagnostics, you might discover that one or more data points are moderately outlying. These are points that you have determined are not data entry errors, from a different population than the rest of your data, and for which you have no compelling reason to exclude them from the analysis. Robust regression is a compromise between deleting these points, and allowing them to violate the assumptions of OLS regression.
Before we continue with our discussion of robust regression, we need to define some terms.
Residual: The difference between the predicted value (based on the regression equation) and the actual, observed value.
Outlier: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.
Leverage: An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an effect on the estimate of regression coefficients.
Influence: An observation is said to be influential if removing the observation substantially changes the estimate of coefficients. Influence can be thought of as the product of leverage and outlierness.
Robust regression deals with cases that have very high leverage, and cases that are outliers. Robust regression is essentially a compromise between dropping the case(s) that are moderate outliers and seriously violating the assumptions of OLS regression. It is a form of weighted least squares regression. According to the Stata 9 Reference Manual (page 162), the robust regression procedure runs the OLS regression, gets the Cook's D values, and then drops any observation if its Cook's D value is greater than 1. Then iteration process begins in which weights are calculated based on absolute residuals. The iterating stops when the maximum change between the weights from one iteration to the next is below tolerance. Two types of weights are used. In Huber weighting, observations with small residuals get a weight of 1, the larger the residual, the smaller the weight. With biweighting, all cases with a non-zero residual get down-weighted at least a little. The two different kinds of weight are used because Huber weights can have difficulties with severe outliers, and biweights can have difficulties converging or may yield multiple solutions. Using the Huber weights first helps to minimize problems with the biweights. You can see the iteration history of both types of weights at the top of the robust regression output. Using the Stata defaults, robust regression is about 95% as efficient as OLS (Hamilton, 1991). In short, the most influential points are dropped, and then cases with large absolute residuals are down-weighted.
use http://www.ats.ucla.edu/stat/stata/webbooks/reg/crime, clear drop if sid == 51
In most cases, you begin by running an OLS regression and doing some diagnostics. We will begin by running an OLS regression. The lvr2plot is used to create a graph showing the leverage versus the squared residuals, and the mlabel option is used to label the points on the graph with the two-letter abbreviation for each state.
regress crime poverty single lvr2plot, mlabel(state)
As we can see, Florida and Mississippi have the largest residuals.
We use the predict command to create a new variable called d1, which contains the values of Cook's D. Although Stata only drops cases with d1 > 1 (and none of the cases in this example are weighted zero (AKA dropped)), we still want to see which cases have a relatively large Cook's D. The lowest value that Cook's D can assume is zero, and the higher the Cook's D is, the more influential the point. The conventional cut-off point is 4/n, where n is the number of observations in the data set. (Note that both Cook's D and DFITS combine information about the residual and leverage. Cook's D and DFITS are very similar, except that they are scaled differently.)
predict d1, cooksd
list if d>4/50, clean
sid state crime murder pctmetro pctwhite pcths poverty single d1
9. 9 fl 1206 8.9 93 83.5 74.4 17.8 10.6 .1573168
25. 25 ms 434 13.5 30.7 63.3 64.3 24.7 14.7 .7963277
49. 49 wv 208 6.9 41.8 96.3 66 22.2 9.4 .0872589
Now we will look at the residuals. We will again use the predict command, this time with the rstandard option. We will generate a new variable called absr1, which is the absolute value of the standardized residuals (because the sign of the residual doesn't matter). The gsort command is used to sort the data. The minus sign before the variable name is used to sort the data in descending order.
predict r1, rstandard
gen absr1 = abs(r1)
gsort -absr1
li state absr1 in 1/10
+------------------+
| state absr1 |
|------------------|
1. | ms 3.158753 |
2. | fl 3.023632 |
3. | vt 1.831356 |
4. | md 1.62075 |
5. | mt 1.588843 |
|------------------|
6. | me 1.578434 |
7. | il 1.550569 |
8. | ca 1.401128 |
9. | ny 1.334536 |
10. | ma 1.288611 |
+------------------+
Now let's run our first robust regression and we can use the generate option to have Stata save the weights that it uses into a new variable in the data set. This can be very useful.
rreg crime poverty single, gen(weight)
Huber iteration 1: maximum difference in weights = .66846346
Huber iteration 2: maximum difference in weights = .11288069
Huber iteration 3: maximum difference in weights = .01810715
Biweight iteration 4: maximum difference in weights = .29167992
Biweight iteration 5: maximum difference in weights = .10354281
Biweight iteration 6: maximum difference in weights = .01421094
Biweight iteration 7: maximum difference in weights = .0033545
Robust regression Number of obs = 50
F( 2, 47) = 31.15
Prob > F = 0.0000
------------------------------------------------------------------------------
crime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
poverty | 10.36971 7.629288 1.36 0.181 -4.978432 25.71786
single | 142.6339 22.17042 6.43 0.000 98.03276 187.235
_cons | -1160.931 224.2564 -5.18 0.000 -1612.076 -709.7849
------------------------------------------------------------------------------
sort weight
li sid state weight absr1 d1 in 1/10
+-----------------------------------------------+
| sid state d1 absr1 weight |
|-----------------------------------------------|
1. | 25 ms .7963277 3.158753 .02638862 |
2. | 9 fl .1573168 3.023632 .11772218 |
3. | 46 vt .0473293 1.831356 .59144513 |
4. | 26 mt .0199024 1.588843 .66441582 |
5. | 20 md .0634833 1.62075 .67960728 |
|-----------------------------------------------|
6. | 14 il .0184802 1.550569 .69124917 |
7. | 21 me .0276457 1.578434 .69766511 |
8. | 31 nj .0215676 1.193654 .74574796 |
9. | 19 ma .0189608 1.288611 .75392127 |
10. | 5 ca .0317504 1.401128 .80179038 |
+-----------------------------------------------+
Roughly, as the residual goes down, the weight goes up. In other words, cases with a large residual tend to be down-weighted, and the values of Cook's D don't really correspond to the weights. This output shows us that the observation for Mississippi will be down-weighted the most. Florida will also be substantially down-weighted. In OLS regression, all cases have a weight of 1. Hence, the more cases in the robust regression that have a weight close to one, the closer the results of the OLS and robust regressions.
Now that we have seen the values of Cook's D and the residuals, let's look compare the results of a regular OLS regression and a robust regression. If the results are very different, you will most likely want to use the results from the robust regression.
regress crime poverty single
Source | SS df MS Number of obs = 50
-------------+------------------------------ F( 2, 47) = 17.77
Model | 1847292.58 2 923646.291 Prob > F = 0.0000
Residual | 2442332.64 47 51964.5242 R-squared = 0.4306
-------------+------------------------------ Adj R-squared = 0.4064
Total | 4289625.22 49 87543.3718 Root MSE = 227.96
------------------------------------------------------------------------------
crime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
poverty | 7.59141 8.415899 0.90 0.372 -9.339194 24.52201
single | 120.617 24.45628 4.93 0.000 71.41735 169.8167
_cons | -879.7966 247.3782 -3.56 0.001 -1377.457 -382.1359
------------------------------------------------------------------------------
rreg crime poverty single
Huber iteration 1: maximum difference in weights = .66846346
Huber iteration 2: maximum difference in weights = .11288069
Huber iteration 3: maximum difference in weights = .01810715
Biweight iteration 4: maximum difference in weights = .29167992
Biweight iteration 5: maximum difference in weights = .10354281
Biweight iteration 6: maximum difference in weights = .01421094
Biweight iteration 7: maximum difference in weights = .0033545
Robust regression Number of obs = 50
F( 2, 47) = 31.15
Prob > F = 0.0000
------------------------------------------------------------------------------
crime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
poverty | 10.36971 7.629288 1.36 0.181 -4.978432 25.71786
single | 142.6339 22.17042 6.43 0.000 98.03276 187.235
_cons | -1160.931 224.2564 -5.18 0.000 -1612.076 -709.7849
------------------------------------------------------------------------------
As you can see, the results from the two analyses are fairly different, especially with respect to the coefficients of single and the constant (_cons). While normally we are not interested in the constant, if you had centered one or both of the predictor variables, the constant would be useful. On the other hand, you will notice that poverty is not statistically significant in either analysis, while single is significant in both analyses. You will also notice that no R-squared, adjusted R-squared or root MSE in rreg output.
The results of the robust regression would be written up in the same way that OLS results would be; the interpretation of the coefficients and overall significance of the model mean exactly the same as in OLS regression.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services