|
|
|
||||
|
|
|||||
Note: the code on this page works with SAS 9.1.3.
Robust regression can be used in any situation in which you would use OLS regression. When doing the regression diagnostics, you might discover that one or more data points are moderately outlying. These are points that you have determined are not data entry errors, from a different population than the rest of your data, and for which you have no compelling reason to exclude them from the analysis. Robust regression is a compromise between deleting these points, and allowing them to violate the assumptions of OLS regression.
Before we continue with our discussion of robust regression, we need to define some terms.
Residual: The difference between the predicted value (based on the regression equation) and the actual, observed value.
Outlier: In linear regression, an outlier is an observation with large residual. In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.
Leverage: An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an effect on the estimate of regression coefficients.
Influence: An observation is said to be influential if removing the observation substantially changes the estimate of coefficients. Influence can be thought of as the product of leverage and outlierness.
Robust regression deals with cases that have very high leverage, and cases that are outliers. Robust regression is essentially a compromise between dropping the case(s) that are moderate outliers and seriously violating the assumptions of OLS regression. It is a form of weighted least squares regression and is done iteratively. At each step a new set of weights are determined based on the residuals. In general, the larger the residuals, the smaller the weights. So the weights depend on residuals. At the same time, the residuals depend on the model and the model depends on the weights. This generates an iteration process and it goes on until the change in the parameter estimates are small enough. A few types of weighting schemes are implemented. In Huber weighting, observations with small residuals get a weight of 1, the larger the residual, the smaller the weight. With bisquare, all cases with a non-zero residual get down-weighted at least a little.
data crime; infile "D:\work\data\raw\crime.csv" delimiter="," firstobs=2; input sid state $ crime murder pctmetro pctwhite pcths poverty single; run; data crime2; set crime; if sid ~=51; run;
In most cases, you begin by running an OLS regression and doing some diagnostics. We will begin by running an OLS regression.
proc reg data = crime2; model crime = poverty single ; output out = t student=res cookd = cookd; run; quit; goptions reset = all; symbol pointlabel = ("#state") value=none; axis1 label=(a=90 'Standardized Residuals'); proc gplot data = t; plot res*cookd =1 /vaxis=axis1 ; run; quit;
As we can see, observation for state = fl and state = ms have high residual and high leverage values. Let us display these observations.
proc print data = crime2; where state = "fl" or state = "ms"; run;Obs sid state crime murder pctmetro pctwhite pcths poverty single 9 9 fl 1206 8.9 93.0 83.5 74.4 17.8 10.6 25 25 ms 434 13.5 30.7 63.3 64.3 24.7 14.7
Let's also take a look at the Cook's D and standardized residuals using the data set generated for the plot above. The lowest value that Cook's D can assume is zero, and the higher the Cook's D is, the more influential the point. The conventional cut-off point is 4/n, where n is the number of observations in the data set.
proc print data = t noobs; where cookd >4/50; run;sid state crime murder pctmetro pctwhite pcths poverty single res cookd 9 fl 1206 8.9 93.0 83.5 74.4 17.8 10.6 3.02363 0.15732 25 ms 434 13.5 30.7 63.3 64.3 24.7 14.7 -3.15875 0.79633 49 wv 208 6.9 41.8 96.3 66.0 22.2 9.4 -1.04739 0.08726
Now we will look at the residuals. We will first generate a new variable rabs containing the absolute value of standardized residuals. Then we sort the data on rabs in descending order. We then list the first ten observations.
data t2; set t; rabs = abs(res); run; proc sort data = t2; by descending rabs; run; proc print data = t2 (obs=10); run;Obs sid state crime murder pctmetro pctwhite pcths poverty single res cookd rabs 1 25 ms 434 13.5 30.7 63.3 64.3 24.7 14.7 -3.15875 0.79633 3.15875 2 9 fl 1206 8.9 93.0 83.5 74.4 17.8 10.6 3.02363 0.15732 3.02363 3 46 vt 114 3.6 27.0 98.4 80.8 10.0 11.0 -1.83136 0.04733 1.83136 4 20 md 998 12.7 92.8 68.9 78.4 9.7 12.0 1.62075 0.06348 1.62075 5 26 mt 178 3.0 24.0 92.6 81.0 14.9 10.8 -1.58884 0.01990 1.58884 6 21 me 126 1.6 35.7 98.5 78.8 10.7 10.6 -1.57843 0.02765 1.57843 7 14 il 960 11.4 84.0 81.0 76.2 13.6 11.5 1.55057 0.01848 1.55057 8 5 ca 1078 13.1 96.7 79.3 76.2 18.2 12.5 1.40113 0.03175 1.40113 9 34 ny 1074 13.3 91.7 77.2 74.8 16.4 12.7 1.33454 0.02727 1.33454 10 19 ma 805 3.9 96.2 91.1 80.0 10.7 10.9 1.28861 0.01896 1.28861
Now let's run our first robust regression. Robust regression is done by iterated re-weighted least squares. The procedure for running robust regression is proc robustreg. There are a couple of estimators for IWLS. We are going to use the Huber estimator in this example. We can save the final weights created by the IWLS process. This can be very useful. We will use the data set t2 generated above. It includes the original data set crime2 and the diagnostic variables generated based on the OLS regression model.
proc robustreg data=t2 method=m (wf=huber) ; model crime = poverty single; output out = test1 weight=wgt; run; Model Information Data Set WORK.CRIME2 Dependent Variable crime Number of Independent Variables 2 Number of Observations 50 Method M Estimation Number of Observations Read 50 Number of Observations Used 50 Summary Statistics Standard Variable Q1 Median Q3 Mean Deviation MAD poverty 10.7000 13.1000 17.4000 14.0160 4.2867 4.2254 single 10.0000 10.9000 12.0000 11.1100 1.4751 1.4085 crime 326.0 509.5 766.0 566.7 295.9 342.5 Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 -1046.90 224.8372 -1487.57 -606.223 21.68 <.0001 poverty 1 8.9437 7.6490 -6.0482 23.9355 1.37 0.2423 single 1 133.8009 22.2278 90.2352 177.3667 36.23 <.0001 Scale 1 165.8365 Diagnostics Summary Observation Type Proportion Cutoff Outlier 0.0400 3.0000 Goodness-of-Fit Statistic Value R-Square 0.4402 AICR 75.9033 BICR 81.3912 Deviance 1915638proc sort data = test1; by wgt ; run; proc print data = test1 (obs=10); run;Obs sid state crime murder pctmetro pctwhite pcths poverty single res cookd rabs wgt 1 25 ms 434 13.5 30.7 63.3 64.3 24.7 14.7 -3.15875 0.79633 3.15875 0.31554 2 9 fl 1206 8.9 93.0 83.5 74.4 17.8 10.6 3.02363 0.15732 3.02363 0.33024 3 46 vt 114 3.6 27.0 98.4 80.8 10.0 11.0 -1.83136 0.04733 1.83136 0.55714 4 26 mt 178 3.0 24.0 92.6 81.0 14.9 10.8 -1.58884 0.01990 1.58884 0.63113 5 20 md 998 12.7 92.8 68.9 78.4 9.7 12.0 1.62075 0.06348 1.62075 0.63271 6 14 il 960 11.4 84.0 81.0 76.2 13.6 11.5 1.55057 0.01848 1.55057 0.64363 7 21 me 126 1.6 35.7 98.5 78.8 10.7 10.6 -1.57843 0.02765 1.57843 0.65393 8 19 ma 805 3.9 96.2 91.1 80.0 10.7 10.9 1.28861 0.01896 1.28861 0.74907 9 31 nj 627 5.3 100.0 80.8 76.7 10.9 9.6 1.19365 0.02157 1.19365 0.76408 10 5 ca 1078 13.1 96.7 79.3 76.2 18.2 12.5 1.40113 0.03175 1.40113 0.77018
Roughly, as the residual goes down, the weight goes up. In other words, cases with a large residual tend to be down-weighted, and the values of Cook's D don't really correspond to the weights. This output shows us that the observation for Mississippi will be down-weighted the most. Florida will also be substantially down-weighted. In OLS regression, all cases have a weight of 1. Hence, the more cases in the robust regression that have a weight close to one, the closer the results of the OLS and robust regressions.
Now that we have seen the values of Cook's D and the residuals from the OLS regression model, let's look compare the results of a regular OLS regression and a robust regression. If the results are very different, you will most likely want to use the results from the robust regression.
ods listing close; proc reg data = crime2; model crime = poverty single; ods output parameterestimates = a; run; quit; proc robustreg data = crime2; model crime = poverty single; ods output parameterestimates = b; run; quit; ods listing; title "OLS regression"; proc print data = a;OLS regression Obs Model Dependent Variable DF Estimate StdErr tValue Probt 1 MODEL1 crime Intercept 1 -879.79654 247.37818 -3.56 0.0009 2 MODEL1 crime poverty 1 7.59141 8.41590 0.90 0.3716 3 MODEL1 crime single 1 120.61704 24.45628 4.93 <.0001title "Robust regression"; proc print data = b; run;Robust regression Prob Obs Parameter DF Estimate StdErr LowerCL UpperCL ChiSq ChiSq 1 Intercept 1 -1169.27 221.8967 -1604.18 -734.365 27.77 <.0001 2 poverty 1 10.3477 7.5490 -4.4481 25.1435 1.88 0.1705 3 single 1 143.3800 21.9371 100.3840 186.3760 42.72 <.0001 4 Scale 1 167.2543 _ _ _ _ _
As you can see, the results from the two analyses are fairly different,
especially with respect to the coefficients of single and the constant (_cons).
While normally we are not interested in the constant, if you had centered one or
both of the predictor variables, the constant would be useful. On the
other hand, you will notice that poverty is not statistically significant
in either analysis, while single is significant in both analyses.
The results of the robust regression would be written up in the same way that OLS results would be; the interpretation of the coefficients and overall significance of the model mean exactly the same as in OLS regression.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services