### Stata FAQ

How can I get an R^{2} with robust regression (rreg)?

Some Stata users have found that there are values of e(r2) and e(r2_a) after running
the

**rreg**. You can see it in the example below using the

**crime** dataset. We want to
caution against using these values as measures of model fit (see discussion below).

**use http://www.ats.ucla.edu/stat/data/crime, clear
rreg crime pctmetro pcths poverty single, tolerance(.001)**
Huber iteration 1: maximum difference in weights = .40486383
Huber iteration 2: maximum difference in weights = .10882095
Huber iteration 3: maximum difference in weights = .012962
Huber iteration 4: maximum difference in weights = .00356214
Biweight iteration 5: maximum difference in weights = .1550734
Biweight iteration 6: maximum difference in weights = .00583604
Biweight iteration 7: maximum difference in weights = .00102764
Biweight iteration 8: maximum difference in weights = .00023256
Robust regression Number of obs = 50
F( 4, 45) = 27.85
Prob > F = 0.0000
------------------------------------------------------------------------------
crime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pctmetro | 7.008278 1.182934 5.92 0.000 4.625726 9.390829
pcths | -2.049362 6.96946 -0.29 0.770 -16.08657 11.98785
poverty | 15.49762 10.16624 1.52 0.134 -4.978252 35.97348
single | 99.83795 19.01549 5.25 0.000 61.53879 138.1371
_cons | -1071.87 641.3146 -1.67 0.102 -2363.544 219.8038
------------------------------------------------------------------------------
**display "R2 = " e(r2)**
R2 = .71228451
**display "adjusted R2 = " e(r2_a)**
adjusted R2 = .6867098

To understand why these values shown above are not appropriate you need to understand
what is going on inside the

**rreg** program.

**rreg** goes through a series
of iterations in which it computes and recomputes weights for each of the observations.
After the program reaches convergence, it goes through one more step in which it creates
pseduovalues of the dependent variable using the final set of weights, a scaling factor and
a couple of other values. It then uses the pseudovalues as the response variable in
an OLS regression. The ereturn values, such as e(r2), e(r2_a), etc, are left over from
that OLS regression model. According to Street, Carroll and Ruppert these auxiliary
values that are left over from the pseudovalue regression are not meaningful and should
not be used.

UCLA Statistical Consulting has written a program,

**rregfit**, that will compute
R-squared and several other fit indices.
You can download the

**rregfit** command by typing

**findit rregfit** in the Stata command line (see

How can I use the findit command to search for programs and get additional
help? for more information about using

**findit)**.
It is demonstrated in the example below using
the robust regression model from above.

**rregfit**
robust regression measures of fit
R-square = .66989605
AICR = 42.917151
BICR = 55.940273
deviance = 1064093

Using

**rregfit** the R-squared was 0.67 while the

**ereturn list** from

**rreg** gave
the incorrect value of 0.71.

**Reference**
Hampel, F. R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986) Robust Statistics: The Approach Based on Influence Functions,
New York: John Wiley & Sons, Inc.

Ronchetti, E. (1985) "Robust Model Selection in Regression," Statistics and Probability Letters,
3, 21-23.

(2008) SAS 9.2 Documentation for GLM. Cary, NC: SAS Institute Inc.

Street, J.O., Carroll, R.J. and Ruppert, D. A (1988) note on computing robust regression
estimates via iteratively reweighted least squares. The American Statistician, Vol 42, No. 2,
pp. 152-154.

The content of this web site should not be construed as an endorsement
of any particular web site, book, or software product by the
University of California.