### SAS Data Analysis Examples Robust Regression

Robust regression is an alternative to least squares regression when  data is contaminated with outliers or influential observations and it can also be used for the purpose of detecting influential observations.

Please note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do.  In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics or potential follow-up analyses.

#### Introduction

Let's begin our discussion on robust regression with some terms in linear regression.

Residual:  The difference between the predicted value (based on the regression equation) and the actual, observed value.

Outlier:  In linear regression, an outlier is an observation with large residual.  In other words, it is an observation whose dependent-variable value is unusual given its value on the predictor variables.  An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Leverage:  An observation with an extreme value on a predictor variable is a point with high leverage.  Leverage is a measure of how far an independent variable deviates from its mean.  High leverage points can have a great amount of effect on the estimate of regression coefficients.

Influence:  An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients.  Influence can be thought of as the product of leverage and outlierness.

Cook's distance (or Cook's D): A measure that combines the information of leverage and residual of the observation.

Robust regression can be used in any situation in which you would use least squares regression.  When fitting a least squares regression, we might find some outliers or high leverage data points.  We have decided that these data points are not data entry errors, neither they are from a different population than most of our data. So we have no compelling reason to exclude them from the analysis.  Robust regression might be a good strategy since it is a compromise between excluding these points entirely from the analysis and including all the data points and treating all them equally in OLS regression. The idea of robust regression is to weigh the observations differently based on how well behaved these observations are. Roughly speaking, it is a form of weighted and reweighted least squares regression.

Proc robustreg in SAS command implements several versions of robust regression. In this page, we will show M-estimation with Huber and bisquare weighting. These two are very standard and are combined as the default weighting function in Stata's robust regression command.  In Huber weighting, observations with small residuals get a weight of 1 and the larger the residual, the smaller the weight.  With bisquare weighting, all cases with a non-zero residual get down-weighted at least a little.

#### Description of the example data

For our data analysis below, we will use the data set crime. This dataset  appears in Statistical Methods for Social Sciences, Third Edition by Alan Agresti and Barbara Finlay (Prentice Hall, 1997).  The variables are state id (sid), state name (state), violent crimes per 100,000 people (crime), murders per 1,000,000 (murder),  the percent of the population living in metropolitan areas (pctmetro), the percent of the population that is white (pctwhite), percent of population with a high school education or above (pcths), percent of population living under poverty line (poverty), and percent of population that are single parents (single).  It has 51 observations. We are going to use poverty and single to predict crime.
data crime;
infile "crime.csv" delimiter="," firstobs=2;
input sid state \$ crime murder pctmetro pctwhite pcths poverty single;
run;

proc means data = crime;
var crime poverty single;
run;

The MEANS Procedure

Variable     N            Mean         Std Dev         Minimum         Maximum
------------------------------------------------------------------------------
crime       51     612.8431373     441.1003229      82.0000000         2922.00
poverty     51      14.2588235       4.5842415       8.0000000      26.4000000
single      51      11.3254902       2.1214941       8.4000000      22.1000000
------------------------------------------------------------------------------

#### Using robust regression analysis

In most cases, we begin by running an OLS regression and doing some diagnostics. We will begin by running an OLS regression.  We create a graph showing the leverage versus the squared residuals, labeling the points with the state abbreviations. To do so, we output the residuals and leverage in proc reg (along with Cook's-D, which we will use later).

proc reg data = crime;
model crime = poverty single ;
output out = t student=res cookd = cookd h = lev;
run;
quit;

data t; set t;
resid_sq = res*res;
run;

proc sgplot data = t;
scatter y = lev x = resid_sq / datalabel = state;
run;
quit;


As we can see, DC, Florida and Mississippi have either high leverage or large residuals. We can display the observations that have relatively large values of Cook's D. A conventional cut-off point is 4/n, where n is the number of observations in the data set. We will use this criterion to select the values to display.
proc print data = t;
where cookd > 4/51;
var state crime poverty single cookd;
run;

Obs    state    crime    poverty    single     cookd

1     ak        761       9.1      14.3     0.12547
9     fl       1206      17.8      10.6     0.14259
25     ms        434      24.7      14.7     0.61387
51     dc       2922      26.4      22.1     2.63625


We probably should drop DC to begin with since it is not even a state. We include it in the analysis just to show that it has large Cook's D and demonstrate how it will be handled by proc robustreg. Now we will look at the residuals. We will generate a new variable called absr1, which is the absolute value of the residuals (because the sign of the residual doesn't matter). We then print the ten observations with the highest absolute residual values.

data t2; set t;
rabs = abs(res);
run;

proc sort data  = t2;
by descending rabs;
run;

proc print data = t2 (obs=10);
run;
                          p      p                                                    r
c      c           p                                        e
m     t      t           o     s                                  s
s     c     u     m      w     p     v     i                c                 i
t     r     r     e      h     c     e     n                o                 d        r
O   s  a     i     d     t      i     t     r     g       r        o        l        _        a
b   i  t     m     e     r      t     h     t     l       e        k        e        s        b
s   d  e     e     r     o      e     s     y     e       s        d        v        q        s

1  25  ms   434  13.5   30.7  63.3  64.3  24.7  14.7  -3.56299  0.61387  0.12669  12.6949  3.56299
2   9  fl  1206   8.9   93.0  83.5  74.4  17.8  10.6   2.90266  0.14259  0.04832   8.4255  2.90266
3  51  dc  2922  78.5  100.0  31.8  73.1  26.4  22.1   2.61645  2.63625  0.53602   6.8458  2.61645
4  46  vt   114   3.6   27.0  98.4  80.8  10.0  11.0  -1.74241  0.04272  0.04050   3.0360  1.74241
5  26  mt   178   3.0   24.0  92.6  81.0  14.9  10.8  -1.46088  0.01676  0.02301   2.1342  1.46088
6  21  me   126   1.6   35.7  98.5  78.8  10.7  10.6  -1.42674  0.02233  0.03186   2.0356  1.42674
7   1  ak   761   9.0   41.8  75.2  86.6   9.1  14.3  -1.39742  0.12547  0.16161   1.9528  1.39742
8  31  nj   627   5.3  100.0  80.8  76.7  10.9   9.6   1.35415  0.02229  0.03519   1.8337  1.35415
9  14  il   960  11.4   84.0  81.0  76.2  13.6  11.5   1.33819  0.01266  0.02076   1.7908  1.33819
10  20  md   998  12.7   92.8  68.9  78.4   9.7  12.0   1.28709  0.03570  0.06072   1.6566  1.28709


Now let's run our first robust regression. Robust regression is done by iterated re-weighted least squares. The procedure for running robust regression is proc robustreg. There are a couple of estimators for IWLS. We are going to first use the Huber weights in this example. We can save the final weights created by the IWLS process. This can be very useful. We  will use the data set t2 generated above.

proc robustreg data=t2 method=m (wf=huber) ;
model crime = poverty single;
output out = t3 weight=wgt;
run;

Model Information
Data Set                                  WORK.T2
Dependent Variable                          crime
Number of Independent Variables                 2
Number of Observations                         51
Method                               M Estimation

Number of Observations Used          51

Summary Statistics
Standard
Variable           Q1      Median          Q3        Mean    Deviation         MAD

poverty       10.7000     13.1000     17.4000     14.2588       4.5842      4.2995
single        10.0000     10.9000     12.1000     11.3255       2.1215      1.4826
crime           326.0       515.0       780.0       612.8        441.1       345.4

Parameter Estimates

Standard   95% Confidence     Chi-
Parameter DF Estimate    Error       Limits       Square Pr > ChiSq

Intercept  1 -1423.23 167.5099 -1751.54 -1094.91   72.19     <.0001
poverty    1   8.8694   8.0429  -6.8944  24.6331    1.22     0.2701
single     1 169.0012  17.3795 134.9381 203.0644   94.56     <.0001
Scale      1 181.7251

Diagnostics Summary

Observation
Type           Proportion      Cutoff
Outlier            0.0392      3.0000

Goodness-of-Fit

Statistic       Value

R-Square       0.5257
AICR          73.1089
BICR          78.9100
Deviance      2216391

proc sort data = t3;
by wgt ;
run;
proc print data = t3 (obs=15);
var state crime poverty single res cookd lev wgt;
run;
Obs    state    crime    poverty    single       res       cookd       lev        wgt
1     ms        434      24.7      14.7     -3.56299    0.61387    0.12669    0.28886
2     fl       1206      17.8      10.6      2.90266    0.14259    0.04832    0.35947
3     vt        114      10.0      11.0     -1.74241    0.04272    0.04050    0.59545
4     dc       2922      26.4      22.1      2.61645    2.63625    0.53602    0.64980
5     mt        178      14.9      10.8     -1.46088    0.01676    0.02301    0.68630
6     me        126      10.7      10.6     -1.42674    0.02233    0.03186    0.72509
7     nj        627      10.9       9.6      1.35415    0.02229    0.03519    0.73812
8     il        960      13.6      11.5      1.33819    0.01266    0.02076    0.76600
9     ak        761       9.1      14.3     -1.39742    0.12547    0.16161    0.78039
10     md        998       9.7      12.0      1.28709    0.03570    0.06072    0.79570
11     ma        805      10.7      10.9      1.19854    0.01640    0.03311    0.83933
12     la       1062      26.4      14.9     -1.02183    0.06700    0.16143    0.91528
13     ca       1078      18.2      12.5      1.01521    0.01231    0.03458    1.00000
14     wy        286      13.3      10.8     -0.96626    0.00667    0.02099    1.00000
15     sc       1023      18.7      12.3      0.91213    0.01111    0.03853    1.00000



We can see that roughly, as the absolute residual goes down, the weight goes up. In other words, cases with a large residuals tend to be down-weighted. We can also see that the values of Cook's D don't really correspond to the weights.  This output shows us that the observation for Mississippi will be down-weighted the most.  Florida will also be substantially down-weighted.  All observations not shown above have a weight of 1. In OLS regression, all cases have a weight of 1.  Hence, the more cases in the robust regression that have a weight close to one, the closer the results of the OLS and robust regressions.

Next, let's run the same model, but using the default weighting function. Again, we can look at the weights.

proc robustreg data=t2 method=m (wf=bisquare);
model crime = poverty single;
output out = t4 weight=wgt;
run;

Model Information

Data Set                                  WORK.T2
Dependent Variable                          crime
Number of Independent Variables                 2
Number of Observations                         51
Method                               M Estimation

Number of Observations Used          51

Summary Statistics

Standard
Variable           Q1      Median          Q3        Mean    Deviation         MAD

poverty       10.7000     13.1000     17.4000     14.2588       4.5842      4.2995
single        10.0000     10.9000     12.1000     11.3255       2.1215      1.4826
crime           326.0       515.0       780.0       612.8        441.1       345.4

Parameter Estimates

Standard   95% Confidence     Chi-
Parameter DF Estimate    Error       Limits       Square Pr > ChiSq

Intercept  1 -1535.36 164.4627 -1857.70 -1213.02   87.15     <.0001
poverty    1  11.6911   7.8966  -3.7859  27.1681    2.19     0.1387
single     1 175.9320  17.0633 142.4884 209.3755  106.31     <.0001
Scale      1 202.3464

Diagnostics Summary
Observation
Type           Proportion      Cutoff
Outlier            0.0392      3.0000

Goodness-of-Fit

Statistic       Value

R-Square       0.4545
AICR          55.1327
BICR          62.3771
Deviance      2071016

proc print data = t4 (obs=15);
var state crime poverty single res cookd lev wgt;
run;

Obs    state    crime    poverty    single       res       cookd       lev        wgt
1     ms        434      24.7      14.7     -3.56299    0.61387    0.12669    0.00764
2     fl       1206      17.8      10.6      2.90266    0.14259    0.04832    0.25292
3     vt        114      10.0      11.0     -1.74241    0.04272    0.04050    0.67151
4     mt        178      14.9      10.8     -1.46088    0.01676    0.02301    0.73114
5     nj        627      10.9       9.6      1.35415    0.02229    0.03519    0.75135
6     la       1062      26.4      14.9     -1.02183    0.06700    0.16143    0.76887
7     me        126      10.7      10.6     -1.42674    0.02233    0.03186    0.77412
8     ak        761       9.1      14.3     -1.39742    0.12547    0.16161    0.77766
9     il        960      13.6      11.5      1.33819    0.01266    0.02076    0.79368
10     md        998       9.7      12.0      1.28709    0.03570    0.06072    0.79908
11     ma        805      10.7      10.9      1.19854    0.01640    0.03311    0.81260
12     dc       2922      26.4      22.1      2.61645    2.63625    0.53602    0.85455
13     wy        286      13.3      10.8     -0.96626    0.00667    0.02099    0.88166
14     ca       1078      18.2      12.5      1.01521    0.01231    0.03458    0.91174
15     ga        723      13.5      13.0     -0.68492    0.00691    0.04232    0.92402


We can see that the weight given to Mississippi is dramatically lower using the bisquare weighting function than the Huber weighting function and the parameter estimates from these two different weighting methods differ. When comparing the results of a regular OLS regression and a robust regression, if the results are very different, you will most likely want to use the results from the robust regression. Large differences suggest that the model parameters are being highly influenced by outliers. When using robust regression, SAS documentation notes: "estimates are more sensitive to the parameters of these weight functions than to the type of the weight function". However, different functions have advantages and drawbacks. Huber weights can have difficulties with severe outliers, and bisquare weights can have difficulties converging or may yield multiple solutions.

As you can see, the results from the two analyses are fairly different, especially with respect to the coefficients of single and the constant (_cons).  While normally we are not interested in the constant, if you had centered one or both of the predictor variables, the constant would be useful.  On the other hand, you will notice that poverty is not statistically significant in either analysis, while single is significant in both analyses.

#### Things to consider

• Proc robustreg does not address issues of heterogeneity of variance. This problem can be addressed by using proc surveyreg. Details can be found in this SAS Code Fragment.
• The examples shown here have presented SAS code for M estimation. There are other estimation options available in proc robustreg: Least trimmed squares, S estimation, and MM estimation. See the SAS documentation for when and how these alternatives are preferable.

#### References

• Li, G. 1985. Robust regression. In Exploring Data Tables, Trends, and Shapes, ed. D. C. Hoaglin, F. Mosteller, and J. W. Tukey, Wiley.
• Chen, C. 2002. Robust Regression and Outlier Detection with the ROBUSTREG Procedure. Proceedings of the 27th SAS Users Group International Conference, Cary NC: SAS Institute, Inc.
• John Fox, Applied regression analysis, linear models, and related models, Sage publications, Inc, 1997