UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Data Analysis Examples
Robust Regression

Examples

Robust regression can be used in any situation in which you would use OLS regression.  When doing the regression diagnostics, you might discover that one or more data points are moderately outlying.  These are points that you have determined are not data entry errors, from a different population than the rest of your data, and for which you have no compelling reason to exclude them from the analysis.  Robust regression is a compromise between deleting these points, and allowing them to violate the assumptions of OLS regression.

Some Definitions

Before we continue with our discussion of robust regression, we need to define some terms.

Residual:  The difference between the predicted value (based on the regression equation) and the actual, observed value.

Outlier:  In linear regression, an outlier is an observation with large residual.  In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables.  An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Leverage:  An observation with an extreme value on a predictor variable is a point with high leverage.  Leverage is a measure of how far an independent variable deviates from its mean.  These leverage points can have an effect on the estimate of regression coefficients.

Influence:  An observation is said to be influential if removing the observation substantially changes the estimate of coefficients.  Influence can be thought of as the product of leverage and outlierness. 

Robust regression deals with cases that have very high leverage, and cases that are outliers.  Robust regression is essentially a compromise between dropping the case(s) that are moderate outliers and seriously violating the assumptions of OLS regression.  It is a form of weighted least squares regression.  According to the Stata 9 Reference Manual (page 162), the robust regression procedure runs the OLS regression, gets the Cook's D values, and then drops any observation if its Cook's D value is greater than 1.  Then iteration process begins in which weights are calculated based on absolute residuals.  The iterating stops when the maximum change between the weights from one iteration to the next is below tolerance.  Two types of weights are used.  In Huber weighting, observations with small residuals get a weight of 1, the larger the residual, the smaller the weight.  With biweighting, all cases with a non-zero residual get down-weighted at least a little.  The two different kinds of weight are used because Huber weights can have difficulties with severe outliers, and biweights can have difficulties converging or may yield multiple solutions.  Using the Huber weights first helps to minimize problems with the biweights.  You can see the iteration history of both types of weights at the top of the robust regression output.  Using the Stata defaults, robust regression is about 95% as efficient as OLS (Hamilton, 1991).  In short, the most influential points are dropped, and then cases with large absolute residuals are down-weighted.

Description of the Data

For our data analysis below, we will use the crime data set.  This dataset  appears in Statistical Methods for Social Sciences, Third Edition by Alan Agresti and Barbara Finlay (Prentice Hall, 1997).  The variables are state id (sid), state name (state), violent crimes per 100,000 people (crime), murders per 1,000,000 (murder),  the percent of the population living in metropolitan areas (pctmetro), the percent of the population that is white (pctwhite), percent of population with a high school education or above (pcths), percent of population living under poverty line (poverty), and percent of population that are single parents (single).  We will drop the observation for Washington, D.C. (sid=51) because it is not a state.
use http://www.ats.ucla.edu/stat/stata/webbooks/reg/crime, clear
drop if sid == 51

Using Robust Regression Analysis

In most cases, you begin by running an OLS regression and doing some diagnostics.  We will begin by running an OLS regression.  The lvr2plot is used to create a graph showing the leverage versus the squared residuals, and the mlabel option is used to label the points on the graph with the two-letter abbreviation for each state. 

regress crime poverty single
lvr2plot, mlabel(state)

As we can see, Florida and Mississippi have the largest residuals. 

We use the predict command to create a new variable called d1, which contains the values of Cook's D.  Although Stata only drops cases with d1 > 1 (and none of the cases in this example are weighted zero (AKA dropped)), we still want to see which cases have a relatively large Cook's D.  The lowest value that Cook's D can assume is zero, and the higher the Cook's D is, the more influential the point. The conventional cut-off point is 4/n, where n is the number of observations in the data set. (Note that both Cook's D and DFITS combine information about the residual and leverage. Cook's D and DFITS are very similar, except that they are scaled differently.)

predict d1, cooksd
list if d>4/50, clean

       sid   state   crime   murder   pctmetro   pctwhite   pcths   poverty   single         d1  
  9.     9      fl    1206      8.9         93       83.5    74.4      17.8     10.6   .1573168  
 25.    25      ms     434     13.5       30.7       63.3    64.3      24.7     14.7   .7963277  
 49.    49      wv     208      6.9       41.8       96.3      66      22.2      9.4   .0872589  

Now we will look at the residuals.  We will again use the predict command, this time with the rstandard option.  We will generate a new variable called absr1, which is the absolute value of the standardized residuals (because the sign of the residual doesn't matter).  The gsort command is used to sort the data.  The minus sign before the variable name is used to sort the data in descending order.

predict r1, rstandard
gen absr1 = abs(r1)
gsort -absr1
li state absr1 in 1/10

     +------------------+
     | state      absr1 |
     |------------------|
  1. |    ms   3.158753 |
  2. |    fl   3.023632 |
  3. |    vt   1.831356 |
  4. |    md    1.62075 |
  5. |    mt   1.588843 |
     |------------------|
  6. |    me   1.578434 |
  7. |    il   1.550569 |
  8. |    ca   1.401128 |
  9. |    ny   1.334536 |
 10. |    ma   1.288611 |
     +------------------+

Now let's run our first robust regression and we can use the generate option to have Stata save the weights that it uses into a new variable in the data set.  This can be very useful.

rreg crime poverty single, gen(weight)

   Huber iteration 1:  maximum difference in weights = .66846346
   Huber iteration 2:  maximum difference in weights = .11288069
   Huber iteration 3:  maximum difference in weights = .01810715
Biweight iteration 4:  maximum difference in weights = .29167992
Biweight iteration 5:  maximum difference in weights = .10354281
Biweight iteration 6:  maximum difference in weights = .01421094
Biweight iteration 7:  maximum difference in weights = .0033545

Robust regression                                      Number of obs =      50
                                                       F(  2,    47) =   31.15
                                                       Prob > F      =  0.0000

------------------------------------------------------------------------------
       crime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     poverty |   10.36971   7.629288     1.36   0.181    -4.978432    25.71786
      single |   142.6339   22.17042     6.43   0.000     98.03276     187.235
       _cons |  -1160.931   224.2564    -5.18   0.000    -1612.076   -709.7849
------------------------------------------------------------------------------

sort weight
li sid state weight absr1 d1 in 1/10
	

     +-----------------------------------------------+
     | sid   state         d1      absr1      weight |
     |-----------------------------------------------|
  1. |  25      ms   .7963277   3.158753   .02638862 |
  2. |   9      fl   .1573168   3.023632   .11772218 |
  3. |  46      vt   .0473293   1.831356   .59144513 |
  4. |  26      mt   .0199024   1.588843   .66441582 |
  5. |  20      md   .0634833    1.62075   .67960728 |
     |-----------------------------------------------|
  6. |  14      il   .0184802   1.550569   .69124917 |
  7. |  21      me   .0276457   1.578434   .69766511 |
  8. |  31      nj   .0215676   1.193654   .74574796 |
  9. |  19      ma   .0189608   1.288611   .75392127 |
 10. |   5      ca   .0317504   1.401128   .80179038 |
     +-----------------------------------------------+

Roughly, as the residual goes down, the weight goes up.  In other words, cases with a large residual tend to be down-weighted, and the values of Cook's D don't really correspond to the weights.  This output shows us that the observation for Mississippi will be down-weighted the most.  Florida will also be substantially down-weighted.  In OLS regression, all cases have a weight of 1.  Hence, the more cases in the robust regression that have a weight close to one, the closer the results of the OLS and robust regressions.

Now that we have seen the values of Cook's D and the residuals, let's look compare the results of a regular OLS regression and a robust regression.  If the results are very different, you will most likely want to use the results from the robust regression.

regress crime poverty single

      Source |       SS       df       MS              Number of obs =      50
-------------+------------------------------           F(  2,    47) =   17.77
       Model |  1847292.58     2  923646.291           Prob > F      =  0.0000
    Residual |  2442332.64    47  51964.5242           R-squared     =  0.4306
-------------+------------------------------           Adj R-squared =  0.4064
       Total |  4289625.22    49  87543.3718           Root MSE      =  227.96

------------------------------------------------------------------------------
       crime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     poverty |    7.59141   8.415899     0.90   0.372    -9.339194    24.52201
      single |    120.617   24.45628     4.93   0.000     71.41735    169.8167
       _cons |  -879.7966   247.3782    -3.56   0.001    -1377.457   -382.1359
------------------------------------------------------------------------------

rreg crime poverty single

   Huber iteration 1:  maximum difference in weights = .66846346
   Huber iteration 2:  maximum difference in weights = .11288069
   Huber iteration 3:  maximum difference in weights = .01810715
Biweight iteration 4:  maximum difference in weights = .29167992
Biweight iteration 5:  maximum difference in weights = .10354281
Biweight iteration 6:  maximum difference in weights = .01421094
Biweight iteration 7:  maximum difference in weights = .0033545

Robust regression                                      Number of obs =      50
                                                       F(  2,    47) =   31.15
                                                       Prob > F      =  0.0000

------------------------------------------------------------------------------
       crime |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     poverty |   10.36971   7.629288     1.36   0.181    -4.978432    25.71786
      single |   142.6339   22.17042     6.43   0.000     98.03276     187.235
       _cons |  -1160.931   224.2564    -5.18   0.000    -1612.076   -709.7849
------------------------------------------------------------------------------

As you can see, the results from the two analyses are fairly different, especially with respect to the coefficients of single and the constant (_cons).  While normally we are not interested in the constant, if you had centered one or both of the predictor variables, the constant would be useful.  On the other hand, you will notice that poverty is not statistically significant in either analysis, while single is significant in both analyses.  You will also notice that no R-squared, adjusted R-squared or root MSE in rreg output.

Sample Write-up of the Analysis

The results of the robust regression would be written up in the same way that OLS results would be; the interpretation of the coefficients and overall significance of the model mean exactly the same as in OLS regression.

Cautions, Flies in the Ointment

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California