use http://www.ats.ucla.edu/stat/data/crime, clear
rreg crime pctmetro pcths poverty single, tolerance(.001)
Huber iteration 1: maximum difference in weights = .40486383
Huber iteration 2: maximum difference in weights = .10882095
Huber iteration 3: maximum difference in weights = .012962
Huber iteration 4: maximum difference in weights = .00356214
Biweight iteration 5: maximum difference in weights = .1550734
Biweight iteration 6: maximum difference in weights = .00583604
Biweight iteration 7: maximum difference in weights = .00102764
Biweight iteration 8: maximum difference in weights = .00023256
Robust regression Number of obs = 50
F( 4, 45) = 27.85
Prob > F = 0.0000
------------------------------------------------------------------------------
crime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pctmetro | 7.008278 1.182934 5.92 0.000 4.625726 9.390829
pcths | -2.049362 6.96946 -0.29 0.770 -16.08657 11.98785
poverty | 15.49762 10.16624 1.52 0.134 -4.978252 35.97348
single | 99.83795 19.01549 5.25 0.000 61.53879 138.1371
_cons | -1071.87 641.3146 -1.67 0.102 -2363.544 219.8038
------------------------------------------------------------------------------
display "R2 = " e(r2)
R2 = .71228451
display "adjusted R2 = " e(r2_a)
adjusted R2 = .6867098
To understand why these values shown above are not appropriate you need to understand
what is going on inside the rreg program. rreg goes through a series
of iterations in which it computes and recomputes weights for each of the observations.
After the program reaches convergence, it goes through one more step in which it creates
pseduovalues of the dependent variable using the final set of weights, a scaling factor and
a couple of other values. It then uses the pseudovalues as the response variable in
an OLS regression. The ereturn values, such as e(r2), e(r2_a), etc, are left over from
that OLS regression model. According to Street, Carroll and Ruppert these auxiliary
values that are left over from the pseudovalue regression are not meaningful and should
not be used.
UCLA Statistical Consulting has written a program, rregfit, that will compute
R-squared and several other fit indices.
You can download the rregfit command by typing
findit rregfit in the Stata command line (see
How can I use the findit command to search for programs and get additional
help? for more information about using findit).
It is demonstrated in the example below using
the robust regression model from above.
Using rregfit the R-squared was 0.67 while the ereturn list from rreg gave the incorrect value of 0.71. Reference Hampel, F. R., Ronchetti, E.M., Rousseeuw, P.J. and Stahel, W.A. (1986) Robust Statistics: The Approach Based on Influence Functions, New York: John Wiley & Sons, Inc. Ronchetti, E. (1985) "Robust Model Selection in Regression," Statistics and Probability Letters, 3, 21-23. (2008) SAS 9.2 Documentation for GLM. Cary, NC: SAS Institute Inc. Street, J.O., Carroll, R.J. and Ruppert, D. A (1988) note on computing robust regression estimates via iteratively reweighted least squares. The American Statistician, Vol 42, No. 2, pp. 152-154.rregfit robust regression measures of fit R-square = .66989605 AICR = 42.917151 BICR = 55.940273 deviance = 1064093
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.