UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

R Data Analysis Examples
Robust Regression

Note: the code on this page works with R 2.4.1.


Examples

Robust regression can be used in any situation in which you would use OLS regression.  When doing the regression diagnostics, you might discover that one or more data points are moderately outlying.  These are points that you have determined are not data entry errors, from a different population than the rest of your data, and for which you have no compelling reason to exclude them from the analysis.  Robust regression is a compromise between deleting these points, and allowing them to violate the assumptions of OLS regression.

Some Definitions

Before we continue with our discussion of robust regression, we need to define some terms.

Residual:  The difference between the predicted value (based on the regression equation) and the actual, observed value.

Outlier:  In linear regression, an outlier is an observation with large residual.  In other words, it is an observation whose dependent-variable value is unusual given its values on the predictor variables.  An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

Leverage:  An observation with an extreme value on a predictor variable is a point with high leverage.  Leverage is a measure of how far an independent variable deviates from its mean.  These leverage points can have an effect on the estimate of regression coefficients.

Influence:  An observation is said to be influential if removing the observation substantially changes the estimate of coefficients.  Influence can be thought of as the product of leverage and outlierness. 

Robust regression deals with cases that have very high leverage, and cases that are outliers.  Robust regression is essentially a compromise between dropping the case(s) that are moderate outliers and seriously violating the assumptions of OLS regression.  It is a form of weighted least squares regression and is done iteratively. At each step a new set of weights are determined based on the residuals. In general, the larger the residuals, the smaller the weights. So the weights depend on residuals. At the same time, the residuals depend on the model and the model depends on the weights. This generates an iteration process and it goes on until the change in the parameter estimates are small enough. A few types of weighting schemes are implemented.  In Huber weighting, observations with small residuals get a weight of 1, the larger the residual, the smaller the weight.  With bisquare, all cases with a non-zero residual get down-weighted at least a little.   

Description of the Data

For our data analysis below, we will use the crime data set.  This dataset  appears in Statistical Methods for Social Sciences, Third Edition by Alan Agresti and Barbara Finlay (Prentice Hall, 1997).  The variables are state id (sid), state name (state), violent crimes per 100,000 people (crime), murders per 1,000,000 (murder),  the percent of the population living in metropolitan areas (pctmetro), the percent of the population that is white (pctwhite), percent of population with a high school education or above (pcths), percent of population living under poverty line (poverty), and percent of population that are single parents (single).  We will drop the observation for Washington, D.C. (sid=51) because it is not a state.
crime<-read.table("http://www.ats.ucla.edu/stat/R/dae/crime.csv", sep=",", header = TRUE) 
attach(crime)
crime2<-crime[ sid !=51, ]

Using Robust Regression Analysis

In most cases, you begin by running an OLS regression and doing some diagnostics.  We will begin by running an OLS regression. 

m1 <- lm(crime ~ poverty + single, data=crime2)
opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0)) 
plot(m1, las = 1) 
par(opar) 

As we can see, observation 9 and 25 have high residual and leverage values. Let us display these observations.

crime2[9,]
  sid state crime murder pctmetro pctwhite pcths poverty single
9   9    fl  1206    8.9       93     83.5  74.4    17.8   10.6
crime2[25,]
   sid state crime murder pctmetro pctwhite pcths poverty single
25  25    ms   434   13.5     30.7     63.3  64.3    24.7   14.7

We can also generate the Cook's D and standardized residuals using the package MASS. The lowest value that Cook's D can assume is zero, and the higher the Cook's D is, the more influential the point. The conventional cut-off point is 4/n, where n is the number of observations in the data set.

library(MASS)
d1<-cooks.distance(m1)
r<-stdres(m1)
a<-cbind(crime2, d1, r)
a[d1>4/50,]
   sid state crime murder pctmetro pctwhite pcths poverty single         d1         r
9    9    fl  1206    8.9     93.0     83.5  74.4    17.8   10.6 0.15731688  3.023632
25  25    ms   434   13.5     30.7     63.3  64.3    24.7   14.7 0.79632761 -3.158753
49  49    wv   208    6.9     41.8     96.3  66.0    22.2    9.4 0.08725893 -1.047394

Now we will look at the residuals.  We will first generate a new variable rabs containing the absolute value of standardized residuals. Then we sort the data on rabs in descending order. We then list the first ten observations.

rabs <-abs(r)
a<cbind{crime2, d1, r, rabs)
asorted<-a[order(-rabs), ]
asorted[1:10,]
   sid state crime murder pctmetro pctwhite pcths poverty single         d1         r     rabs
25  25    ms   434   13.5     30.7     63.3  64.3    24.7   14.7 0.79632761 -3.158753 3.158753
9    9    fl  1206    8.9     93.0     83.5  74.4    17.8   10.6 0.15731688  3.023632 3.023632
46  46    vt   114    3.6     27.0     98.4  80.8    10.0   11.0 0.04732933 -1.831356 1.831356
20  20    md   998   12.7     92.8     68.9  78.4     9.7   12.0 0.06348335  1.620750 1.620750
26  26    mt   178    3.0     24.0     92.6  81.0    14.9   10.8 0.01990241 -1.588842 1.588842
21  21    me   126    1.6     35.7     98.5  78.8    10.7   10.6 0.02764574 -1.578434 1.578434
14  14    il   960   11.4     84.0     81.0  76.2    13.6   11.5 0.01848024  1.550569 1.550569
5    5    ca  1078   13.1     96.7     79.3  76.2    18.2   12.5 0.03175037  1.401128 1.401128
34  34    ny  1074   13.3     91.7     77.2  74.8    16.4   12.7 0.02727377  1.334536 1.334536
19  19    ma   805    3.9     96.2     91.1  80.0    10.7   10.9 0.01896076  1.288611 1.288611

Now let's run our first robust regression. Robust regression is done by iterated re-weighted least squares. The function for running robust regression is rlm from the package MASS. There are a couple of estimators for IWLS. The default estimator that rlm uses is Huber estimator. We can save the final weights created by the IWLS process.  This can be very useful.

m1.huber<-rlm(crime ~poverty + single, data=crime2)
summary(m1.huber)
Call: rlm(formula = crime ~ poverty + single, data = crime2)
Residuals:
    Min      1Q  Median      3Q     Max 
-706.88  -96.04  -21.72  115.24  675.41 

Coefficients:
            Value      Std. Error t value   
(Intercept) -1046.8870   224.9294    -4.6543
poverty         8.9435     7.6522     1.1688
single        133.8002    22.2370     6.0170

Residual standard error: 165.8 on 47 degrees of freedom

Correlation of Coefficients:
        (Intercept) poverty
poverty -0.0042            
single  -0.8932     -0.4303

a<-cbind(crime2, d1, r, m1.huber$w)
asorted<-a[order(m1.huber$w),]
asorted[1:10,]
   sid state crime murder pctmetro pctwhite pcths poverty single         d1         r m1.huber$w
25  25    ms   434   13.5     30.7     63.3  64.3    24.7   14.7 0.79632761 -3.158753  0.3155457
9    9    fl  1206    8.9     93.0     83.5  74.4    17.8   10.6 0.15731688  3.023632  0.3302352
46  46    vt   114    3.6     27.0     98.4  80.8    10.0   11.0 0.04732933 -1.831356  0.5571331
26  26    mt   178    3.0     24.0     92.6  81.0    14.9   10.8 0.01990241 -1.588842  0.6311308
20  20    md   998   12.7     92.8     68.9  78.4     9.7   12.0 0.06348335  1.620750  0.6326876
14  14    il   960   11.4     84.0     81.0  76.2    13.6   11.5 0.01848024  1.550569  0.6436023
21  21    me   126    1.6     35.7     98.5  78.8    10.7   10.6 0.02764574 -1.578434  0.6539268
19  19    ma   805    3.9     96.2     91.1  80.0    10.7   10.9 0.01896076  1.288611  0.7490496
31  31    nj   627    5.3    100.0     80.8  76.7    10.9    9.6 0.02156761  1.193654  0.7640642
5    5    ca  1078   13.1     96.7     79.3  76.2    18.2   12.5 0.03175037  1.401128  0.7701236

Roughly, as the residual goes down, the weight goes up.  In other words, cases with a large residual tend to be down-weighted, and the values of Cook's D don't really correspond to the weights.  This output shows us that the observation for Mississippi will be down-weighted the most.  Florida will also be substantially down-weighted.  In OLS regression, all cases have a weight of 1.  Hence, the more cases in the robust regression that have a weight close to one, the closer the results of the OLS and robust regressions.

Now that we have seen the values of Cook's D and the residuals from the OLS regression model, let's look compare the results of a regular OLS regression and a robust regression.  If the results are very different, you will most likely want to use the results from the robust regression.

m1<-lm(crime ~poverty + single, data=crime2)
summary(m1)

Call:
lm(formula = crime ~ poverty + single, data = crime2)

Residuals:
    Min      1Q  Median      3Q     Max 
-646.78 -106.42  -15.34  112.66  672.13 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -879.797    247.378  -3.556  0.00087 ***
poverty        7.591      8.416   0.902  0.37164    
single       120.617     24.456   4.932 1.06e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 228 on 47 degrees of freedom
Multiple R-Squared: 0.4306,     Adjusted R-squared: 0.4064 
F-statistic: 17.77 on 2 and 47 DF,  p-value: 1.785e-06 
m1.huber<-rlm(crime ~poverty + single, data=crime2)
summary(m1.huber)

Call: rlm(formula = crime ~ poverty + single, data = crime2)
Residuals:
    Min      1Q  Median      3Q     Max 
-706.88  -96.04  -21.72  115.24  675.41 

Coefficients:
            Value      Std. Error t value   
(Intercept) -1046.8870   224.9294    -4.6543
poverty         8.9435     7.6522     1.1688
single        133.8002    22.2370     6.0170

Residual standard error: 165.8 on 47 degrees of freedom

Correlation of Coefficients:
        (Intercept) poverty
poverty -0.0042            
single  -0.8932     -0.4303

As you can see, the results from the two analyses are fairly different, especially with respect to the coefficients of single and the constant (_cons).  While normally we are not interested in the constant, if you had centered one or both of the predictor variables, the constant would be useful.  On the other hand, you will notice that poverty is not statistically significant in either analysis, while single is significant in both analyses.  You will also notice that no R-squared, adjusted R-squared in rlm output.

Sample Write-up of the Analysis

The results of the robust regression would be written up in the same way that OLS results would be; the interpretation of the coefficients and overall significance of the model mean exactly the same as in OLS regression.

Cautions, Flies in the Ointment

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California