UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Textbook Examples
Applied Linear Statistical Models by Neter, Kutner, et. al.
Chapter 8: Building Regression Models I: Selection of Predictor Variables

Inputting the Surgical Unit data, table 8.1, p. 335.
clear
input x1 x2 x3 x4 y logy
   6.7  62   81  2.59  200  2.3010
   5.1  59   66  1.70  101  2.0043
   7.4  57   83  2.16  204  2.3096
   6.5  73   41  2.01  101  2.0043
   7.8  65  115  4.30  509  2.7067
   5.8  38   72  1.42   80  1.9031
   5.7  46   63  1.91   80  1.9031
   3.7  68   81  2.57  127  2.1038
   6.0  67   93  2.50  202  2.3054
   3.7  76   94  2.40  203  2.3075
   6.3  84   83  4.13  329  2.5172
   6.7  51   43  1.86   65  1.8129
   5.8  96  114  3.95  830  2.9191
   5.8  83   88  3.95  330  2.5185
   7.7  62   67  3.40  168  2.2253
   7.4  74   68  2.40  217  2.3365
   6.0  85   28  2.98   87  1.9395
   3.7  51   41  1.55   34  1.5315
   7.3  68   74  3.56  215  2.3324
   5.6  57   87  3.02  172  2.2355
   5.2  52   76  2.85  109  2.0374
   3.4  83   53  1.12  136  2.1335
   6.7  26   68  2.10   70  1.8451
   5.8  67   86  3.40  220  2.3424
   6.3  59  100  2.95  276  2.4409
   5.8  61   73  3.50  144  2.1584
   5.2  52   86  2.45  181  2.2577
  11.2  76   90  5.59  574  2.7589
   5.2  54   56  2.71   72  1.8573
   5.8  76   59  2.58  178  2.2504
   3.2  64   65  0.74   71  1.8513
   8.7  45   23  2.52   58  1.7634
   5.0  59   73  3.50  116  2.0645
   5.8  72   93  3.30  295  2.4698
   5.4  58   70  2.64  115  2.0607
   5.3  51   99  2.60  184  2.2648
   2.6  74   86  2.05  118  2.0719
   4.3   8  119  2.85  120  2.0792
   4.8  61   76  2.45  151  2.1790
   5.4  52   88  1.81  148  2.1703
   5.2  49   72  1.84   95  1.9777
   3.6  28   99  1.30   75  1.8751
   8.8  86   88  6.40  483  2.6840
   6.5  56   77  2.85  153  2.1847
   3.4  77   93  1.48  191  2.2810
   6.5  40   84  3.00  123  2.0899
   4.5  73  106  3.05  311  2.4928
   4.8  86  101  4.10  398  2.5999
   5.1  67   77  2.86  158  2.1987
   3.9  82  103  4.55  310  2.4914
   6.6  77   46  1.95  124  2.0934
   6.4  85   40  1.21  125  2.0969
   6.4  59   85  2.33  198  2.2967
   8.8  78   72  3.20  313  2.4955
end
Generating the interaction of x2 and x3.
gen x2x3=x2*x3
Figure 8.2, page 336

Note: In order to create these graphs we need to first perform four different regressions, namely y on x1 x2 x3 x4, logy on x1 x2 x3 x4, y on x2 x3 and logy on x2 x3. Since we don't need to look at the output of the regression at this point we will use the option quietly which will suppress the output of the regression.

Figure 8.2a, page 336.
quietly regress y x1 x2 x3 x4
predict r, resid
qnorm r, ylabel(-100(100)300) xlabel(-200(100)200)
Figure 8.2b, page 336.
quietly regress y x2 x3
predict r1, resid
graph twoway scatter r1 x2x3, ylabel(-200(100)400) xlabel(0(5000)10000)
Figure 8.2c, page 336.
quietly regress logy x1 x2 x3 x4
predict r2, resid
qnorm r2, ylabel(-.15(.6).15) xlabel(-.15(.6).15)
Figure 8.2d, page 336.
quietly regress logy x2 x3
predict r3, resid
graph twoway scatter r3 x2x3, ylabel(-.4(.1)0.4) xlabel(0 5000 10000)
Figure 8.3a, page 337.
graph matrix logy x1 x2 x3 x4
Figure 8.4, page 337.
corr logy x1 x2 x3 x4
(obs=54)

             |     logy       x1       x2       x3       x4
-------------+---------------------------------------------
        logy |   1.0000
          x1 |   0.3464   1.0000
          x2 |   0.5929   0.0901   1.0000
          x3 |   0.6651  -0.1496  -0.0236   1.0000
          x4 |   0.7262   0.5024   0.3690   0.4164   1.0000
Table 8.2, page 338

This table is obtained using version 1.1 of rsquare which will calculate not only the Rsquare and Mallow's Cp but also the SSE and MSE. Make sure that when you run this you have the updated version! It was not possible to reproduce the following graphs: figures 8.4-8.7.

You can download rsquare from within Stata by typing findit rsquare (see How can I use the findit command to search for programs and get additional help? for more information about using findit).

rsquare logy x1 x2 x3 x4

Regression models for dependent variable : logy

R-squared  Mallow's C           SEE           MSE      models with 1 predictor
0.1200        1510.59          3.4961       0.0672     x1
0.3515        1100.01          2.5763       0.0495     x2
0.4424         938.86          2.2153       0.0426     x3
0.5274         788.15          1.8776       0.0361     x4
R-squared  Mallow's C           SEE           MSE      models with 2 predictor
0.4381         948.55          2.2325       0.0438     x1 x2
0.6458         580.14          1.4072       0.0276     x1 x3
0.5278         789.34          1.8758       0.0368     x1 x4
0.8130         283.67          0.7430       0.0146     x2 x3
0.6496         573.44          1.3922       0.0273     x2 x4
0.6865         507.90          1.2453       0.0244     x3 x4
R-squared  Mallow's C           SEE           MSE      models with 3 predictor
0.9723           3.04          0.1099       0.0022     x1 x2 x3
0.6500         574.71          1.3905       0.0278     x1 x2 x4
0.7192         451.99          1.1156       0.0223     x1 x3 x4
0.8829         161.66          0.4652       0.0093     x2 x3 x4
R-squared  Mallow's C           SEE           MSE      models with 4 predictor
0.9724           5.00          0.1098       0.0022     x1 x2 x3 x4
Forward stepwise regression of Surgical unit data, page 349.

Note1: Unlike BMDP, Stata does not provide any output except the output for the final model.

Note2: Tolerance is the equivalent of 1/VIF.
* Stata 8 code.
sw reg logy x1 x2 x3 x4, pe(.01) pr(.05) forward beta 

* Stata 9 code and output.
* Note: The beta option does not work with the stepwise prefix.
stepwise,  pe(.01) pr(.05) forward: regress logy x1 x2 x3 x4, beta

                      begin with empty model
p = 0.0000 <  0.0100  adding   x4
p = 0.0000 <  0.0100  adding   x3
p = 0.0000 <  0.0100  adding   x2
p = 0.0000 <  0.0100  adding   x1
p = 0.8436 >= 0.0500  removing x4

      Source |       SS       df       MS              Number of obs =      54
-------------+------------------------------           F(  3,    50) =  586.04
       Model |  3.86291372     3  1.28763791           Prob > F      =  0.0000
    Residual |  .109858708    50  .002197174           R-squared     =  0.9723
-------------+------------------------------           Adj R-squared =  0.9707
       Total |  3.97277243    53   .07495797           Root MSE      =  .04687

------------------------------------------------------------------------------
        logy |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .0692251   .0040779    16.98   0.000     .0610343    .0774159
          x3 |   .0095236   .0003064    31.08   0.000     .0089082    .0101391
          x2 |   .0092945   .0003825    24.30   0.000     .0085263    .0100628
       _cons |   .4836209   .0426287    11.34   0.000     .3979985    .5692432
------------------------------------------------------------------------------
* Stata 8 code.
vif

* Stata 9 code and output.
estat vif

    Variable |       VIF       1/VIF  
-------------+----------------------
          x1 |      1.03    0.970108
          x3 |      1.02    0.977506
          x2 |      1.01    0.991774
-------------+----------------------
    Mean VIF |      1.02

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California