UCLA Academic Technology Services HomeServicesClassesContactJobs

SAS Data Analysis Examples
Robust Regression

Note: the code on this page works with SAS 9.1.3.


Examples

Robust regression can be used in any situation in which you would use OLS regression.  When doing the regression diagnostics, you might discover that one or more data points are moderately outlying.  These are points that you have determined are not data entry errors, from a different population than the rest of your data, and for which you have no compelling reason to exclude them from the analysis.  Robust regression is a compromise between deleting these points, and allowing them to violate the assumptions of OLS regression.

Some Definitions

Before we continue with our discussion of robust regression, we need to define some terms.

Residual:  The difference between the predicted value (based on the regression equation) and the actual, observed value.

Outlier:  In linear regression, an outlier is an observation with large residual.  In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables.  An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Leverage:  An observation with an extreme value on a predictor variable is a point with high leverage.  Leverage is a measure of how far an independent variable deviates from its mean.  These leverage points can have an effect on the estimate of regression coefficients.

Influence:  An observation is said to be influential if removing the observation substantially changes the estimate of coefficients.  Influence can be thought of as the product of leverage and outlierness. 

Robust regression deals with cases that have very high leverage, and cases that are outliers.  Robust regression is essentially a compromise between dropping the case(s) that are moderate outliers and seriously violating the assumptions of OLS regression.  It is a form of weighted least squares regression and is done iteratively. At each step a new set of weights are determined based on the residuals. In general, the larger the residuals, the smaller the weights. So the weights depend on residuals. At the same time, the residuals depend on the model and the model depends on the weights. This generates an iteration process and it goes on until the change in the parameter estimates are small enough. A few types of weighting schemes are implemented.  In Huber weighting, observations with small residuals get a weight of 1, the larger the residual, the smaller the weight.  With bisquare, all cases with a non-zero residual get down-weighted at least a little.   

Description of the Data

For our data analysis below, we will use the crime.csv data set.  This dataset  appears in Statistical Methods for Social Sciences, Third Edition by Alan Agresti and Barbara Finlay (Prentice Hall, 1997).  The variables are state id (sid), state name (state), violent crimes per 100,000 people (crime), murders per 1,000,000 (murder),  the percent of the population living in metropolitan areas (pctmetro), the percent of the population that is white (pctwhite), percent of population with a high school education or above (pcths), percent of population living under poverty line (poverty), and percent of population that are single parents (single).  We will drop the observation for Washington, D.C. (sid=51) because it is not a state.
data crime;
  infile "D:\work\data\raw\crime.csv" delimiter="," firstobs=2;
  input sid state $ crime murder pctmetro pctwhite pcths poverty single;
run;
data crime2;
  set crime;
  if sid ~=51;
run;

Using Robust Regression Analysis

In most cases, you begin by running an OLS regression and doing some diagnostics.  We will begin by running an OLS regression. 

proc reg data = crime2;
  model crime = poverty single ;
  output out = t student=res cookd = cookd;
run;
quit;

goptions reset = all;
symbol pointlabel = ("#state") value=none;
axis1 label=(a=90 'Standardized Residuals');
proc gplot data = t;
  plot res*cookd =1 /vaxis=axis1 ;
run;
quit;

As we can see, observation for state = fl and state = ms have high residual and high leverage values. Let us display these observations.

proc print data = crime2;
  where state = "fl" or state = "ms";
run;
Obs    sid    state    crime    murder    pctmetro    pctwhite    pcths    poverty    single

  9      9     fl       1206      8.9       93.0        83.5       74.4      17.8      10.6
 25     25     ms        434     13.5       30.7        63.3       64.3      24.7      14.7

Let's also take a look at the Cook's D and standardized residuals using the data set generated for the plot above. The lowest value that Cook's D can assume is zero, and the higher the Cook's D is, the more influential the point. The conventional cut-off point is 4/n, where n is the number of observations in the data set.

proc print data = t noobs;
  where cookd >4/50;
run;
sid   state   crime   murder   pctmetro   pctwhite   pcths   poverty   single      res      cookd

  9    fl      1206     8.9      93.0       83.5      74.4     17.8     10.6     3.02363   0.15732
 25    ms       434    13.5      30.7       63.3      64.3     24.7     14.7    -3.15875   0.79633
 49    wv       208     6.9      41.8       96.3      66.0     22.2      9.4    -1.04739   0.08726

Now we will look at the residuals.  We will first generate a new variable rabs containing the absolute value of standardized residuals. Then we sort the data on rabs in descending order. We then list the first ten observations.

data t2;
  set t;
  rabs = abs(res);
run;
proc sort data  = t2;
  by descending rabs;
run;
proc print data = t2 (obs=10);
run;
Obs sid state crime murder pctmetro pctwhite pcths poverty single    res    cookd    rabs

  1  25  ms     434  13.5    30.7     63.3    64.3   24.7   14.7  -3.15875 0.79633 3.15875
  2   9  fl    1206   8.9    93.0     83.5    74.4   17.8   10.6   3.02363 0.15732 3.02363
  3  46  vt     114   3.6    27.0     98.4    80.8   10.0   11.0  -1.83136 0.04733 1.83136
  4  20  md     998  12.7    92.8     68.9    78.4    9.7   12.0   1.62075 0.06348 1.62075
  5  26  mt     178   3.0    24.0     92.6    81.0   14.9   10.8  -1.58884 0.01990 1.58884
  6  21  me     126   1.6    35.7     98.5    78.8   10.7   10.6  -1.57843 0.02765 1.57843
  7  14  il     960  11.4    84.0     81.0    76.2   13.6   11.5   1.55057 0.01848 1.55057
  8   5  ca    1078  13.1    96.7     79.3    76.2   18.2   12.5   1.40113 0.03175 1.40113
  9  34  ny    1074  13.3    91.7     77.2    74.8   16.4   12.7   1.33454 0.02727 1.33454
 10  19  ma     805   3.9    96.2     91.1    80.0   10.7   10.9   1.28861 0.01896 1.28861

Now let's run our first robust regression. Robust regression is done by iterated re-weighted least squares. The procedure for running robust regression is proc robustreg. There are a couple of estimators for IWLS. We are going to use the Huber estimator in this example. We can save the final weights created by the IWLS process. This can be very useful. We  will use the data set t2 generated above. It includes the original data set crime2 and the diagnostic variables generated based on the OLS regression model.

proc robustreg data=t2 method=m (wf=huber) ;
   model crime = poverty single; 
   output out = test1 weight=wgt;
run;
 
                Model Information

Data Set                              WORK.CRIME2
Dependent Variable                          crime
Number of Independent Variables                 2
Number of Observations                         50
Method                               M Estimation


Number of Observations Read          50
Number of Observations Used          50


                                Summary Statistics

                                                              Standard
Variable           Q1      Median          Q3        Mean    Deviation         MAD
poverty       10.7000     13.1000     17.4000     14.0160       4.2867      4.2254
single        10.0000     10.9000     12.0000     11.1100       1.4751      1.4085
crime           326.0       509.5       766.0       566.7        295.9       342.5


                        Parameter Estimates

                      Standard   95% Confidence     Chi-
Parameter DF Estimate    Error       Limits       Square Pr > ChiSq

Intercept  1 -1046.90 224.8372 -1487.57 -606.223   21.68     <.0001
poverty    1   8.9437   7.6490  -6.0482  23.9355    1.37     0.2423
single     1 133.8009  22.2278  90.2352 177.3667   36.23     <.0001
Scale      1 165.8365

         Diagnostics Summary

Observation
Type           Proportion      Cutoff
Outlier            0.0400      3.0000

   Goodness-of-Fit

Statistic       Value
R-Square       0.4402
AICR          75.9033
BICR          81.3912
Deviance      1915638
proc sort data = test1;
 by  wgt ;
run;
proc print data = test1 (obs=10);
run;
Obs sid state crime murder pctmetro pctwhite pcths poverty single    res    cookd    rabs    wgt

  1  25  ms     434  13.5     30.7    63.3    64.3   24.7   14.7  -3.15875 0.79633 3.15875 0.31554
  2   9  fl    1206   8.9     93.0    83.5    74.4   17.8   10.6   3.02363 0.15732 3.02363 0.33024
  3  46  vt     114   3.6     27.0    98.4    80.8   10.0   11.0  -1.83136 0.04733 1.83136 0.55714
  4  26  mt     178   3.0     24.0    92.6    81.0   14.9   10.8  -1.58884 0.01990 1.58884 0.63113
  5  20  md     998  12.7     92.8    68.9    78.4    9.7   12.0   1.62075 0.06348 1.62075 0.63271
  6  14  il     960  11.4     84.0    81.0    76.2   13.6   11.5   1.55057 0.01848 1.55057 0.64363
  7  21  me     126   1.6     35.7    98.5    78.8   10.7   10.6  -1.57843 0.02765 1.57843 0.65393
  8  19  ma     805   3.9     96.2    91.1    80.0   10.7   10.9   1.28861 0.01896 1.28861 0.74907
  9  31  nj     627   5.3    100.0    80.8    76.7   10.9    9.6   1.19365 0.02157 1.19365 0.76408
 10   5  ca    1078  13.1     96.7    79.3    76.2   18.2   12.5   1.40113 0.03175 1.40113 0.77018

Roughly, as the residual goes down, the weight goes up.  In other words, cases with a large residual tend to be down-weighted, and the values of Cook's D don't really correspond to the weights.  This output shows us that the observation for Mississippi will be down-weighted the most.  Florida will also be substantially down-weighted.  In OLS regression, all cases have a weight of 1.  Hence, the more cases in the robust regression that have a weight close to one, the closer the results of the OLS and robust regressions.

Now that we have seen the values of Cook's D and the residuals from the OLS regression model, let's look compare the results of a regular OLS regression and a robust regression.  If the results are very different, you will most likely want to use the results from the robust regression.

ods listing close;
proc reg data = crime2;
  model crime = poverty single;
ods output parameterestimates = a;
run;
quit;
proc robustreg data = crime2;
 model crime = poverty single;
ods output parameterestimates = b;
run;
quit;
ods listing;
title "OLS regression";
proc print data = a;
OLS regression                                                                                  

Obs    Model     Dependent    Variable     DF       Estimate         StdErr     tValue     Probt

 1     MODEL1      crime      Intercept     1     -879.79654      247.37818      -3.56    0.0009
 2     MODEL1      crime      poverty       1        7.59141        8.41590       0.90    0.3716
 3     MODEL1      crime      single        1      120.61704       24.45628       4.93    <.0001
title "Robust regression";
proc print data = b;
run;
Robust regression                                                                               
                                                                                       Prob
Obs    Parameter    DF    Estimate      StdErr     LowerCL     UpperCL      ChiSq     ChiSq

 1     Intercept     1    -1169.27    221.8967    -1604.18    -734.365      27.77    <.0001
 2     poverty       1     10.3477      7.5490     -4.4481     25.1435       1.88    0.1705
 3     single        1    143.3800     21.9371    100.3840    186.3760      42.72    <.0001
 4     Scale         1    167.2543       _           _           _            _       _

As you can see, the results from the two analyses are fairly different, especially with respect to the coefficients of single and the constant (_cons).  While normally we are not interested in the constant, if you had centered one or both of the predictor variables, the constant would be useful.  On the other hand, you will notice that poverty is not statistically significant in either analysis, while single is significant in both analyses. 

Sample Write-up of the Analysis

The results of the robust regression would be written up in the same way that OLS results would be; the interpretation of the coefficients and overall significance of the model mean exactly the same as in OLS regression.

Cautions, Flies in the Ointment

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.