UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Textbook Examples
Applied Linear Statistical Models by Neter, Kutner, et. al.
Chapter 10: Building the Regression Model III: Remedial Measures and Validation

Inputting Blood Pressure data (page 407, Table 10.1).
data ch10tab01;
  input x y;
  label x='age'
        y='Dbp';
cards;
  27   73
  21   66
  22   63
  24   75
  25   71
  23   70
  20   65
  20   70
  29   79
  24   72
  25   68
  28   67
  26   79
  38   91
  32   76
  33   69
  31   66
  34   73
  37   78
  38   87
  33   76
  35   79
  30   73
  31   80
  37   68
  39   75
  46   89
  49  101
  40   70
  42   72
  43   80
  46   83
  43   75
  44   71
  46   80
  47   96
  45   92
  49   80
  48   70
  40   90
  42   85
  55   76
  54   71
  57   99
  52   86
  53   79
  56   92
  52   85
  50   71
  59   90
  50   91
  52  100
  58   80
  57  109
;;;
run;
Three Diagnostic Plots Fig. 10.1a and 10.1b, p. 406.
proc reg data=ch10tab01;
  model y = x;
  output out=temp r=residual;
  plot y*x r.*x;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Dbp

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     1     2374.96833     2374.96833      35.79    <.0001
Error                    52     3450.36501       66.35317
Corrected Total          53     5825.33333

Root MSE              8.14575    R-Square     0.4077
Dependent Mean       79.11111    Adj R-Sq     0.3963
Coeff Var            10.29659

                               Parameter Estimates

                                  Parameter       Standard
Variable     Label        DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept     1       56.15693        3.99367      14.06      <.0001
x            age           1        0.58003        0.09695       5.98      <.0001
Fig. 10.1c, p. 406.
data temp;
  set temp;
  absr = abs(residual);
run;
symbol1 v=star h=.8;
axis1 order=(0 to 20 by 5);
proc gplot data = temp;
  plot absr*x/ vaxis = axis1;
run;
quit;
Regressing the absolute residuals against X, formula 10.19 page 406.
proc reg data = temp ;
  model absr = x;
  output out = temp1 p = s ;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: absr

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     1      277.23091      277.23091      13.93    0.0005
Error                    52     1034.62880       19.89671
Corrected Total          53     1311.85971

Root MSE              4.46057    R-Square     0.2113
Dependent Mean        6.29301    Adj R-Sq     0.1962
Coeff Var            70.88141

                               Parameter Estimates

                                  Parameter       Standard
Variable     Label        DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept     1       -1.54948        2.18692      -0.71      0.4818
x            age           1        0.19817        0.05309       3.73      0.0005
Obtaining the weights, w = 1/(s^2).
Table 10.1, p. 407.
data temp1;
  set temp1;
  w = 1/(s**2);
run;
proc print data = temp1 (obs = 10);
run;
Obs     x     y    residual      absr        s          w

  1    27    73     1.18224    1.18224    3.80117    0.06921
  2    21    66    -2.33758    2.33758    2.61214    0.14656
  3    22    63    -5.91761    5.91761    2.81031    0.12662
  4    24    75     4.92233    4.92233    3.20666    0.09725
  5    25    71     0.34230    0.34230    3.40483    0.08626
  6    23    70     0.50236    0.50236    3.00849    0.11049
  7    20    65    -2.75755    2.75755    2.41397    0.17161
  8    20    70     2.24245    2.24245    2.41397    0.17161
  9    29    79     6.02218    6.02218    4.19752    0.05676
 10    24    72     1.92233    1.92233    3.20666    0.09725
The equation (10.20) by using WLS regression. The option clb in the model statement supplies the confidence interval for the parameters.
proc reg data = temp1;
  weight w;
  model y = x / clb;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Dbp

Weight: w

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     1       83.34082       83.34082      56.64    <.0001
Error                    52       76.51351        1.47141
Corrected Total          53      159.85432

Root MSE              1.21302    R-Square     0.5214
Dependent Mean       73.55134    Adj R-Sq     0.5122
Coeff Var             1.64921

                                      Parameter Estimates

                            Parameter     Standard
Variable   Label      DF     Estimate        Error  t Value  Pr > |t|    95% Confidence Limits

Intercept  Intercept   1     55.56577      2.52092    22.04    <.0001     50.50718     60.62436
x          age         1      0.59634      0.07924     7.53    <.0001      0.43734      0.75534
Inputting data for Ridge Regression example, p. 413.
data ch7tab01;
  input X1 X2 X3 Y;
  label x1 = 'Triceps' 
        x2 = 'Thigh cir.'
	x3 = 'Midarm cir.'
         y = 'body fat';
  cards;
  19.5  43.1  29.1  11.9
  24.7  49.8  28.2  22.8
  30.7  51.9  37.0  18.7
  29.8  54.3  31.1  20.1
  19.1  42.2  30.9  12.9
  25.6  53.9  23.7  21.7
  31.4  58.5  27.6  27.1
  27.9  52.1  30.6  25.4
  22.1  49.9  23.2  21.3
  25.5  53.5  24.8  19.3
  31.1  56.6  30.0  25.4
  30.4  56.7  28.3  27.2
  18.7  46.5  23.0  11.7
  19.7  44.2  28.6  17.8
  14.6  42.7  21.3  12.8
  29.5  54.4  30.1  23.9
  27.7  55.3  25.7  22.6
  30.2  58.6  24.6  25.4
  22.7  48.2  27.1  14.8
  25.2  51.0  27.5  21.1
;
run;
Transforming the variables using the correlation transformation (7.44).
proc sql; 
  create table ch7tab1a as
  select *, ( y - mean(y) )/( std(y)*( sqrt( count(y)-1 ) ) ) as ty,
            ( x1 - mean(x1) )/( std(x1)*( sqrt( count(x1)-1 ) ) ) as tx1,
	    ( x2 - mean(x2) )/( std(x2)*( sqrt( count(x2)-1 ) ) ) as tx2,
	    ( x3 - mean(x3) )/( std(x3)*( sqrt( count(x3)-1 ) ) ) as tx3
  from ch7tab01;
quit;
Ridge Regression on Body fat data.
Fig. 10.3, p. 413.
symbol1 v=dot h=.8;
proc reg data = ch7tab1a outest = temp outstb noprint;
  model y = x1-x3/ ridge = (0.001 to 0.1 by .001) outvif  ;
  plot / ridgeplot vref=0;
run;
quit;
The equations at the bottom of p. 413. The first line are the coefficients for the original variables and the second line are the coefficients for the transformed variables. The transformation shown in (7.44) is done automatically by SAS so there is no need to manually transform the variables yourself. Notice that we used the untransformed variables in the regression models!
proc reg data = ch7tab1a outest = temp outstb noprint;
  model y = x1-x3 / ridge = 0.02;
run;
quit;
proc print data = temp;
  where _ridge_ = 0.02 and y = -1;
  var y intercept x1 x2 x3;
run;
Obs     Y    Intercept       X1         X2         X3

 2     -1     -7.40343    0.55535    0.36814    -0.19163
 3     -1      0.00000    0.54633    0.37740    -0.13687
Table 10.2, p. 414.
The outstb option in the proc statement tells SAS to put the parameter estimates in the output temp1. These can then be chosen by specifying RIDGESTB in the where statement of the proc print.
proc reg data = ch7tab1a outest = temp outstb  outvif;
  model y = x1-x3/ridge = (0.0 to 0.01 by 0.002 0.02 to 0.05 by 0.01 0.5 1.0);
run;quit;
proc print data = temp;
  where _type_ = 'RIDGESTB';
  var _ridge_ x1 x2 x3;
run;
The REG Procedure
Model: MODEL1
Dependent Variable: Y body fat

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3      396.98461      132.32820      21.52    <.0001
Error                    16       98.40489        6.15031
Corrected Total          19      495.38950

Root MSE              2.47998    R-Square     0.8014
Dependent Mean       20.19500    Adj R-Sq     0.7641
Coeff Var            12.28017

                                Parameter Estimates

                                    Parameter       Standard
Variable     Label          DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept       1      117.08469       99.78240       1.17      0.2578
X1           Triceps         1        4.33409        3.01551       1.44      0.1699
X2           Thigh cir.      1       -2.85685        2.58202      -1.11      0.2849
X3           Midarm cir.     1       -2.18606        1.59550      -1.37      0.1896
Obs    _RIDGE_       X1         X2          X3

  4     0.000     4.26370    -2.92870    -1.56142
  7     0.002     1.44066    -0.41129    -0.48127
 10     0.004     1.00632    -0.02484    -0.31487
 13     0.006     0.83002     0.13142    -0.24716
 16     0.008     0.73433     0.21576    -0.21030
 19     0.010     0.67417     0.26841    -0.18703
 22     0.020     0.54633     0.37740    -0.13687
 25     0.030     0.50038     0.41341    -0.11808
 28     0.040     0.47600     0.43024    -0.10758
 31     0.050     0.46046     0.43924    -0.10051
 34     0.500     0.33772     0.37906    -0.02950
 37     1.000     0.27977     0.31007    -0.00594
Table 10.3, p. 414.
The outvif option in the proc statement of the regression tells SAS to put the VIF's in the output temp1. These can then be chosen by specifying RIDGEVIF in the where statement of the proc print.
proc print data = temp;
  where _type_ = 'RIDGEVIF';
  var _ridge_ x1 x2 x3;
run;
Obs    _RIDGE_         X1         X2         X3

  2     0.000     708.843    564.343    104.606
  5     0.002      50.559     40.448      8.280
  8     0.004      16.982     13.725      3.363
 11     0.006       8.503      6.976      2.119
 14     0.008       5.147      4.305      1.624
 17     0.010       3.486      2.981      1.377
 20     0.020       1.103      1.081      1.011
 23     0.030       0.626      0.697      0.923
 26     0.040       0.453      0.555      0.881
 29     0.050       0.370      0.486      0.853
 32     0.500       0.154      0.214      0.403
 35     1.000       0.107      0.136      0.227
Inputting the Mathematics Proficiency Data, Table 10.4, p. 421.
Note: The easiest method of including observations with multiple words in one string variable is to connect the words with an underscore.
data ch10tab11; 
  input state $ y x1 x2 x3 x4 x5;
  label y = 'Math profeciency'
        x1 = 'Parents'
        x2 = 'Homelib'
        x3 = 'Reading'
        x4 = 'TV Watching'
        x5 = 'Absences';
cards;
 Alabama                 252  75  78  34  18  18
 Arizona                 259  75  73  41  12  26
 Arkansas                256  77  77  28  20  23
 California              256  78  68  42  11  28
 Colorado                267  78  85  38   9  25
 Connecticut             270  79  86  43  12  22
 Delaware                261  75  83  32  18  28
 Distric_of_Columbia     231  47  76  24  33  37
 Florida                 255  75  73  31  19  27
 Georgia                 258  73  80  36  17  22
 Guam                    231  81  64  32  20  28
 Hawaii                  251  78  69  36  23  26
 Idaho                   272  84  84  48   7  21
 Illinois                260  78  82  43  14  21
 Indiana                 267  81  84  37  11  23
 Iowa                    278  83  88  43   8  20
 Kentucky                256  79  78  36  14  23
 Louisiana               246  73  76  36  19  27
 Maryland                260  75  83  34  19  27
 Michigan                264  77  84  31  14  25
 Minnesota               276  83  88  36   7  20
 Montana                 280  83  88  44   6  21
 Nebraska                276  85  88  42   9  19
 New_Hampshire           273  83  88  40   7  22
 New_Jersey              269  79  84  41  13  23
 New_Mexico              256  77  72  40  11  27
 New_York                261  76  79  35  17  29
 North_Carolina          250  74  78  37  21  25
 North_Dakota            281  85  90  41   6  14
 Ohio                    264  79  84  36  11  22
 Oklahoma                263  78  78  37  14  22
 Oregon                  271  81  82  41   9  31
 Pennsylvania            266  80  86  34  10  24
 Rhode_Island            260  78  80  38  12  28
 Texas                   258  77  70  34  15  18
 Virgin_Islands          218  63  76  23  27  22
 Virginia                264  78  82  33  16  24
 West_Virginia           256  82  80  36  16  25
 Wisconsin               274  81  86  38   8  21
 Wyoming                 272  85  86  43   7  23
;
run;
Fig. 10.5, p. 421.
proc reg data = ch10tab11;
  model y = x2;
  plot y*x2 r.*x2;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     1     3769.30965     3769.30965      47.42    <.0001
Error                    38     3020.59035       79.48922
Corrected Total          39     6789.90000

Root MSE              8.91567    R-Square     0.5551
Dependent Mean      260.95000    Adj R-Sq     0.5434
Coeff Var             3.41662

                                  Parameter Estimates

                                         Parameter       Standard
Variable     Label               DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept            1      135.55589       18.26408       7.42      <.0001
x2           Homelib              1        1.55963        0.22649       6.89      <.0001
The model using robust regression. Invoking the macro robust_hubert which in turn invokes the mad macro. It will create two pictures but this can be modified.  We show the Predicted by Residual below.
%include 'c:\neter\mad.sas';
%include 'c:\neter\robust_hubert.sas';
%robust_hubert(ch10tab11, y, x2, 0.000005, 8);
Below you can see the results of the OLS fit (10.49, page 422) followed by the iterations of the reweighted least squares (see table 10.5, page 422) and then the final results (see 10.51, page 423).
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     1     3769.30965     3769.30965      47.42    <.0001
Error                    38     3020.59035       79.48922
Corrected Total          39     6789.90000

Root MSE              8.91567    R-Square     0.5551
Dependent Mean      260.95000    Adj R-Sq     0.5434
Coeff Var             3.41662

                                  Parameter Estimates

                                         Parameter       Standard
Variable     Label               DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept            1      135.55589       18.26408       7.42      <.0001
x2           Homelib              1        1.55963        0.22649       6.89      <.0001
Obs           r        u         _w2_

  1     -5.2069    -0.77336    1.00000
  2      9.5912     1.42455    0.94416
  3      0.3527     0.05239    1.00000
  4     14.3894     2.13719    0.62933
  5     -1.1243    -0.16699    1.00000
  6      0.3161     0.04695    1.00000
  7     -4.0050    -0.59485    1.00000
  8    -23.0876    -3.42911    0.39223
  9      5.5912     0.83044    1.00000
 10     -2.3261    -0.34549    1.00000
Obs           r      _w2_

  1     -6.1773    1.00000
  2      8.3454    1.00000
  3     -0.6728    1.00000
  4     12.8681    0.70085
  5     -1.7091    1.00000
  6     -0.2136    1.00000
  7     -4.7000    1.00000
  8    -24.1682    0.37316
  9      4.3454    1.00000
 10     -3.1864    1.00000
Obs           r      _w2_

  1     -6.3179    1.00000
  2      8.1025    1.00000
  3     -0.8338    1.00000
  4     12.5230    0.70388
  5     -1.7065    1.00000
  6     -0.1906    1.00000
  7     -4.7384    1.00000
  8    -24.3498    0.36200
  9      4.1025    1.00000
 10     -3.2861    1.00000
Obs           r      _w2_

  1     -6.3529    1.00000
  2      8.0501    1.00000
  3     -0.8723    1.00000
  4     12.4531    0.70504
  5     -1.7171    1.00000
  6     -0.1977    1.00000
  7     -4.7559    1.00000
  8    -24.3917    0.35995
  9      4.0501    1.00000
 10     -3.3141    1.00000
Obs           r      _w2_

  1     -6.3602    1.00000
  2      8.0389    1.00000
  3     -0.8804    1.00000
  4     12.4380    0.70527
  5     -1.7190    1.00000
  6     -0.1988    1.00000
  7     -4.7593    1.00000
  8    -24.4006    0.35951
  9      4.0389    1.00000
 10     -3.3198    1.00000
Obs           r      _w2_

  1     -6.3618    1.00000
  2      8.0365    1.00000
  3     -0.8821    1.00000
  4     12.4348    0.70532
  5     -1.7194    1.00000
  6     -0.1990    1.00000
  7     -4.7600    1.00000
  8    -24.4025    0.35941
  9      4.0365    1.00000
 10     -3.3211    1.00000
Obs           r      _w2_

  1     -6.3621    1.00000
  2      8.0360    1.00000
  3     -0.8825    1.00000
  4     12.4341    0.70533
  5     -1.7194    1.00000
  6     -0.1991    1.00000
  7     -4.7602    1.00000
  8    -24.4029    0.35939
  9      4.0360    1.00000
 10     -3.3213    1.00000
Obs           r      _w2_

  1     -6.3622    1.00000
  2      8.0359    1.00000
  3     -0.8826    1.00000
  4     12.4340    0.70533
  5     -1.7195    1.00000
  6     -0.1991    1.00000
  7     -4.7602    1.00000
  8    -24.4029    0.35939
  9      4.0359    1.00000
 10     -3.3214    1.00000
 
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency

Weight: _w2_

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     1     3165.87899     3165.87899      78.49    <.0001
Error                    38     1532.63864       40.33260
Corrected Total          39     4698.51763

Root MSE              6.35079    R-Square     0.6738
Dependent Mean      262.40346    Adj R-Sq     0.6652
Coeff Var             2.42024

                                  Parameter Estimates

                                         Parameter       Standard
Variable     Label               DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept            1      142.95244       13.52182      10.57      <.0001
x2           Homelib              1        1.47961        0.16700       8.86      <.0001
Sections 10.4 and 10.5 were skipped. For help using Loess method please come see us in consulting and for Bootstrapping you might consider using the bs command in Stata, for example http://www.ats.ucla.edu/stat/stata/examples/ara/arastata16.htm.
Section 10.6--Model Validation!
Inputting the Surgical Unit data, Table 8.1, p. 335.
data ch8tab01;
  input x1 x2 x3 x4 y logy;
  label x1 = 'blood-clotting'
        x2 = 'prognostic'
	x3 = 'enzyme'
	x4 = 'liver function'
	 y = 'survival'
      logy = 'Logsurvival';
cards;
   6.7  62   81  2.59  200  2.3010
   5.1  59   66  1.70  101  2.0043
   7.4  57   83  2.16  204  2.3096
   6.5  73   41  2.01  101  2.0043
   7.8  65  115  4.30  509  2.7067
   5.8  38   72  1.42   80  1.9031
   5.7  46   63  1.91   80  1.9031
   3.7  68   81  2.57  127  2.1038
   6.0  67   93  2.50  202  2.3054
   3.7  76   94  2.40  203  2.3075
   6.3  84   83  4.13  329  2.5172
   6.7  51   43  1.86   65  1.8129
   5.8  96  114  3.95  830  2.9191
   5.8  83   88  3.95  330  2.5185
   7.7  62   67  3.40  168  2.2253
   7.4  74   68  2.40  217  2.3365
   6.0  85   28  2.98   87  1.9395
   3.7  51   41  1.55   34  1.5315
   7.3  68   74  3.56  215  2.3324
   5.6  57   87  3.02  172  2.2355
   5.2  52   76  2.85  109  2.0374
   3.4  83   53  1.12  136  2.1335
   6.7  26   68  2.10   70  1.8451
   5.8  67   86  3.40  220  2.3424
   6.3  59  100  2.95  276  2.4409
   5.8  61   73  3.50  144  2.1584
   5.2  52   86  2.45  181  2.2577
  11.2  76   90  5.59  574  2.7589
   5.2  54   56  2.71   72  1.8573
   5.8  76   59  2.58  178  2.2504
   3.2  64   65  0.74   71  1.8513
   8.7  45   23  2.52   58  1.7634
   5.0  59   73  3.50  116  2.0645
   5.8  72   93  3.30  295  2.4698
   5.4  58   70  2.64  115  2.0607
   5.3  51   99  2.60  184  2.2648
   2.6  74   86  2.05  118  2.0719
   4.3   8  119  2.85  120  2.0792
   4.8  61   76  2.45  151  2.1790
   5.4  52   88  1.81  148  2.1703
   5.2  49   72  1.84   95  1.9777
   3.6  28   99  1.30   75  1.8751
   8.8  86   88  6.40  483  2.6840
   6.5  56   77  2.85  153  2.1847
   3.4  77   93  1.48  191  2.2810
   6.5  40   84  3.00  123  2.0899
   4.5  73  106  3.05  311  2.4928
   4.8  86  101  4.10  398  2.5999
   5.1  67   77  2.86  158  2.1987
   3.9  82  103  4.55  310  2.4914
   6.6  77   46  1.95  124  2.0934
   6.4  85   40  1.21  125  2.0969
   6.4  59   85  2.33  198  2.2967
   8.8  78   72  3.20  313  2.4955
;
run;
Inputting the Validation dataset, Table 10.10, p. 439.
data ch10tab10;
 input x1 x2 x3 x4 logy;
 label x1 = 'Clotting'
       x2 = 'Prognostic'
       x3 = 'Enzyme'
       x4 = 'Liver'
     logy = 'logSurvival';
cards;
   7.1  23   78  1.93  2.0326
   4.9  66   91  3.05  2.4086
   6.4  90   35  1.06  2.2177
   5.7  35   70  2.13  1.9078
   6.1  42   69  2.25  2.0035
   8.0  27   83  2.03  2.0945
   6.8  34   51  1.27  1.7652
   4.7  63   36  1.71  1.7925
   7.0  47   67  1.60  2.1292
   6.7  69   65  2.91  2.2295
   6.7  46   78  3.26  2.1524
   5.8  60   86  3.11  2.3188
   6.7  56   32  1.53  1.9039
   6.8  51   58  2.18  2.0508
   7.2  95   82  4.68  2.6525
   7.4  52   67  3.28  2.2053
   5.3  53   62  2.42  1.9246
   3.5  58   84  1.74  2.1541
   6.8  74   79  2.25  2.4970
   4.4  47   49  2.42  1.7237
   7.0  66  118  4.69  2.8339
   6.7  61   57  3.87  2.1282
   5.6  75  103  3.11  2.6884
   6.9  58   88  3.46  2.4284
   6.2  62   57  1.25  2.0261
   4.7  97   27  1.77  2.0843
   6.8  69   60  2.90  2.2826
   6.0  73   58  1.22  2.2073
   5.9  50   62  3.19  2.0443
   5.5  88   74  3.21  2.4863
   3.8  55   52  1.41  1.9037
   4.3  99   83  3.93  2.6647
   6.6  48   54  2.94  1.9071
   6.2  42   63  1.85  1.9093
   5.0  60  105  3.17  2.4389
   5.8  62   82  3.18  2.3343
   4.7  42   10  0.28  1.3379
   5.7  70   59  2.28  2.1996
   4.7  64   48  1.30  1.8795
   7.8  74   40  2.58  2.1504
   2.9  43   32  0.94  1.4330
   4.9  72   90  3.51  2.4381
   4.6  73   57  2.82  2.1075
   5.9  78   70  4.28  2.2843
   4.6  69   70  3.17  2.1615
   6.1  53   52  1.84  2.0558
   5.9  88   98  3.33  2.7249
   4.7  66   68  1.80  2.0520
  10.4  62   85  4.65  2.6810
   5.8  70   64  2.52  2.2604
   5.4  64   81  1.36  2.2553
   6.9  90   33  2.78  2.1745
   7.9  45   55  2.46  2.0224
   4.5  68   60  2.07  2.1413
 
;;;
run;
Table 10.9, p. 438.
proc reg data = ch8tab01 outest = temp;
  title 'Results from the Model Building Data set';
  model logy = x1 x2 x3/press;
run;
quit;
proc print data = temp;
 var _press_;
run;
proc reg data = ch10tab10 outest = temp;
  title 'Results from the Validation Data set';
  model logy = x1 x2 x3/ press;
run;
quit;
proc print data = temp;
 var _press_;
run;
title ;

Results from the Model Building Data set

The REG Procedure
Model: MODEL1
Dependent Variable: logy Logsurvival

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3        3.86291        1.28764     586.04    <.0001
Error                    50        0.10986        0.00220
Corrected Total          53        3.97277

Root MSE              0.04687    R-Square     0.9723
Dependent Mean        2.20614    Adj R-Sq     0.9707
Coeff Var             2.12470

                                 Parameter Estimates

                                       Parameter       Standard
Variable     Label             DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept          1        0.48362        0.04263      11.34      <.0001
x1           blood-clotting     1        0.06923        0.00408      16.98      <.0001
x2           prognostic         1        0.00929     0.00038250      24.30      <.0001
x3           enzyme             1        0.00952     0.00030641      31.08      <.0001

Results from the Model Building Data set

Obs    _PRESS_

 1     0.14045

Results from the Validation Data set

The REG Procedure
Model: MODEL1
Dependent Variable: logy logSurvival

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3        4.62507        1.54169     730.29    <.0001
Error                    50        0.10555        0.00211
Corrected Total          53        4.73062

Root MSE              0.04595    R-Square     0.9777
Dependent Mean        2.16466    Adj R-Sq     0.9763
Coeff Var             2.12257

                                Parameter Estimates

                                    Parameter       Standard
Variable     Label          DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept       1        0.50082        0.04192      11.95      <.0001
x1           Clotting        1        0.06741        0.00498      13.53      <.0001
x2           Prognostic      1        0.01011     0.00037193      27.18      <.0001
x3           Enzyme          1        0.00974     0.00030225      32.22      <.0001

Results from the Validation Data set

Obs    _PRESS_

 1     0.12125
Case Example--Mathematical Proficiency.
Note: This data has already been input in this program.
Fig. 10.10a, p. 441.
Calling the scatter matrix macro.
%include 'c:\neter\scatter.sas';
%scatter(data = ch10tab11, var= y x1 x2 x3 x4 x5);
<The scatterplot is not shown>
Fig. 10.10b, p. 441.
proc corr data = ch10tab11;
  var y x1-x5;
run; 
he CORR Procedure

   6  Variables:    y        x1       x2       x3       x4       x5

                                       Simple Statistics

Variable         N        Mean     Std Dev         Sum     Minimum     Maximum  Label

y               40   260.95000    13.19470       10438   218.00000   281.00000  Math profeciency
x1              40    77.70000     6.49339        3108    47.00000    85.00000  Parents
x2              40    80.40000     6.30344        3216    64.00000    90.00000  Homelib
x3              40    36.85000     5.26016        1474    23.00000    48.00000  Reading
x4              40    14.00000     5.99572   560.00000     6.00000    33.00000  TV Watching
x5              40    23.92500     4.07863   957.00000    14.00000    37.00000  Absences

                          Pearson Correlation Coefficients, N = 40
                                  Prob > |r| under H0: Rho=0

                            y           x1           x2           x3           x4           x5

y                     1.00000      0.74141      0.74507      0.71659     -0.87348     -0.48034
Math profeciency                    <.0001       <.0001       <.0001       <.0001       0.0017

x1                    0.74141      1.00000      0.39454      0.69304     -0.83115     -0.56531
Parents                <.0001                    0.0118       <.0001       <.0001       0.0001

x2                    0.74507      0.39454      1.00000      0.37692     -0.59364     -0.44262
Homelib                <.0001       0.0118                    0.0165       <.0001       0.0042

x3                    0.71659      0.69304      0.37692      1.00000     -0.79187     -0.35669
Reading                <.0001       <.0001       0.0165                    <.0001       0.0239

x4                   -0.87348     -0.83115     -0.59364     -0.79187      1.00000      0.51168
TV Watching            <.0001       <.0001       <.0001       <.0001                    0.0007

x5                   -0.48034     -0.56531     -0.44262     -0.35669      0.51168      1.00000
Absences               0.0017       0.0001       0.0042       0.0239       0.0007
Fitted model (10.61) p. 441.
proc reg data = ch10tab11;
  model y = x1-x5;
  output out = temp h = hii student=ti cookd = Di;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     5     5846.32774     1169.26555      42.13    <.0001
Error                    34      943.57226       27.75213
Corrected Total          39     6789.90000

Root MSE              5.26803    R-Square     0.8610
Dependent Mean      260.95000    Adj R-Sq     0.8406
Coeff Var             2.01879

                                  Parameter Estimates

                                         Parameter       Standard
Variable     Label               DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept            1      155.03039       36.23830       4.28      0.0001
x1           Parents              1        0.39115        0.25709       1.52      0.1374
x2           Homelib              1        0.86387        0.17971       4.81      <.0001
x3           Reading              1        0.36162        0.26896       1.34      0.1877
x4           TV Watching          1       -0.84672        0.35254      -2.40      0.0219
x5           Absences             1        0.19229        0.26361       0.73      0.4707
Table 10.12, p. 442.
proc print data = temp (obs = 10);
 var hii ti Di;
run;
Obs      hii         ti          Di

  1    0.16014    -0.05464    0.00009
  2    0.18531     0.40076    0.00609
  3    0.16201     1.39338    0.06256
  4    0.29069     0.10337    0.00073
  5    0.09541    -0.57826    0.00588
  6    0.12133     0.03171    0.00002
  7    0.11685     0.64985    0.00931
  8    0.69026     1.39145    0.71914
  9    0.09109     1.44485    0.03487
 10    0.07670     0.48432    0.00325
The fitted model using robust regression (10.62), p. 442.
Running the Hubert robust regression.
Running the Hubert/Biweight robust regression which is similar to rreg in Stata.
Invoking two different macros.
Here is the first macro, robust_hubert.
%include 'c:\neter\mad.sas';
%include 'c:\neter\robust_hubert.sas';
%robust_hubert(ch10tab11, y, x2 x3 x4, 0.0005, 9);
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3     5781.00353     1927.00118      68.76    <.0001
Error                    36     1008.89647       28.02490
Corrected Total          39     6789.90000

Root MSE              5.29386    R-Square     0.8514
Dependent Mean      260.95000    Adj R-Sq     0.8390
Coeff Var             2.02869

                                  Parameter Estimates

                                         Parameter       Standard
Variable     Label               DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept            1      199.61074       21.52892       9.27      <.0001
x2           Homelib              1        0.78043        0.17020       4.59      <.0001
x3           Reading              1        0.40118        0.26876       1.49      0.1442
x4           TV Watching          1       -1.15647        0.27140      -4.26      0.0001
Obs        r           u         _w2_

  1    -1.30771    -0.28443    1.00000
  2    -0.15269    -0.03321    1.00000
  3     8.19275     1.78192    0.75481
  4    -0.80821    -0.17578    1.00000
  5    -3.78369    -0.82295    1.00000
  6    -0.10060    -0.02188    1.00000
  7     4.59251     0.99887    1.00000
  8     0.61205     0.13312    1.00000
  9     7.95444     1.73008    0.77742
 10     1.17259     0.25504    1.00000
Obs        r         _w2_

  1    -1.97842    1.00000
  2     0.70596    1.00000
  3     6.47543    0.92592
  4     0.45509    1.00000
  5    -3.93974    1.00000
  6     0.55384    1.00000
  7     3.35113    1.00000
  8    -1.92494    1.00000
  9     6.95415    0.86218
 10     0.78291    1.00000
Obs        r         _w2_

  1    -2.16585    1.00000
  2     0.68042    1.00000
  3     6.05663    0.99024
  4     0.40731    1.00000
  5    -4.00323    1.00000
  6     0.74252    1.00000
  7     3.13259    1.00000
  8    -2.35721    1.00000
  9     6.60526    0.90799
 10     0.68544    1.00000
Obs        r         _w2_

  1    -2.24010    1.00000
  2     0.60703    1.00000
  3     5.90929    1.00000
  4     0.29252    1.00000
  5    -4.02355    1.00000
  6     0.81795    1.00000
  7     3.07963    1.00000
  8    -2.47327    1.00000
  9     6.45193    0.93764
 10     0.64893    1.00000
Obs        r         _w2_

  1    -2.26999    1.00000
  2     0.55502    1.00000
  3     5.85944    1.00000
  4     0.21310    1.00000
  5    -4.02818    1.00000
  6     0.84502    1.00000
  7     3.07071    1.00000
  8    -2.50289    1.00000
  9     6.38705    0.94683
 10     0.63394    1.00000
Obs        r         _w2_

  1    -2.28228    1.00000
  2     0.53287    1.00000
  3     5.83868    1.00000
  4     0.17892    1.00000
  5    -4.02932    1.00000
  6     0.85736    1.00000
  7     3.06771    1.00000
  8    -2.51505    1.00000
  9     6.35958    0.95076
 10     0.62809    1.00000
Obs        r         _w2_

  1    -2.28754    1.00000
  2     0.52337    1.00000
  3     5.82983    1.00000
  4     0.16428    1.00000
  5    -4.02981    1.00000
  6     0.86264    1.00000
  7     3.06644    1.00000
  8    -2.52021    1.00000
  9     6.34783    0.95245
 10     0.62559    1.00000
Obs        r         _w2_

  1    -2.28978    1.00000
  2     0.51931    1.00000
  3     5.82603    1.00000
  4     0.15802    1.00000
  5    -4.03002    1.00000
  6     0.86489    1.00000
  7     3.06590    1.00000
  8    -2.52242    1.00000
  9     6.34281    0.95317
 10     0.62453    1.00000
Obs        r         _w2_

  1    -2.29074    1.00000
  2     0.51758    1.00000
  3     5.82441    1.00000
  4     0.15534    1.00000
  5    -4.03011    1.00000
  6     0.86586    1.00000
  7     3.06567    1.00000
  8    -2.52337    1.00000
  9     6.34066    0.95348
 10     0.62407    1.00000
 
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency

Weight: _w2_

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3     4391.06625     1463.68875      83.27    <.0001
Error                    36      632.76457       17.57679
Corrected Total          39     5023.83082

Root MSE              4.19247    R-Square     0.8740
Dependent Mean      262.15459    Adj R-Sq     0.8636
Coeff Var             1.59924

                                  Parameter Estimates

                                         Parameter       Standard
Variable     Label               DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept            1      207.83984       17.58882      11.82      <.0001
x2           Homelib              1        0.79410        0.14083       5.64      <.0001
x3           Reading              1        0.16362        0.22036       0.74      0.4626
x4           TV Watching          1       -1.16953        0.21890      -5.34      <.0001
Here is the second macro, robust_hb.
%include 'c:\neter\robust_hb.sas';
%robust_hb(ch10tab11, y, x2 x3 x4, 0.01, 0.0005, 9);
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3     5781.00353     1927.00118      68.76    <.0001
Error                    36     1008.89647       28.02490
Corrected Total          39     6789.90000

Root MSE              5.29386    R-Square     0.8514
Dependent Mean      260.95000    Adj R-Sq     0.8390
Coeff Var             2.02869

                                  Parameter Estimates

                                         Parameter       Standard
Variable     Label               DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept            1      199.61074       21.52892       9.27      <.0001
x2           Homelib              1        0.78043        0.17020       4.59      <.0001
x3           Reading              1        0.40118        0.26876       1.49      0.1442
x4           TV Watching          1       -1.15647        0.27140      -4.26      0.0001
Obs        r           u         _w2_

  1    -1.30771    -0.28443    1.00000
  2    -0.15269    -0.03321    1.00000
  3     8.19275     1.78192    0.75481
  4    -0.80821    -0.17578    1.00000
  5    -3.78369    -0.82295    1.00000
  6    -0.10060    -0.02188    1.00000
  7     4.59251     0.99887    1.00000
  8     0.61205     0.13312    1.00000
  9     7.95444     1.73008    0.77742
 10     1.17259     0.25504    1.00000
Obs        r         _w2_

  1    -1.97842    1.00000
  2     0.70596    1.00000
  3     6.47543    0.92592
  4     0.45509    1.00000
  5    -3.93974    1.00000
  6     0.55384    1.00000
  7     3.35113    1.00000
  8    -1.92494    1.00000
  9     6.95415    0.86218
 10     0.78291    1.00000
Obs        r         _w2_

  1    -2.16585    1.00000
  2     0.68042    1.00000
  3     6.05663    0.99024
  4     0.40731    1.00000
  5    -4.00323    1.00000
  6     0.74252    1.00000
  7     3.13259    1.00000
  8    -2.35721    1.00000
  9     6.60526    0.90799
 10     0.68544    1.00000
Obs        r         _w2_

  1    -2.24010    1.00000
  2     0.60703    1.00000
  3     5.90929    1.00000
  4     0.29252    1.00000
  5    -4.02355    1.00000
  6     0.81795    1.00000
  7     3.07963    1.00000
  8    -2.47327    1.00000
  9     6.45193    0.93764
 10     0.64893    1.00000
Obs        r         _w2_

  1    -2.26999    1.00000
  2     0.55502    1.00000
  3     5.85944    1.00000
  4     0.21310    1.00000
  5    -4.02818    1.00000
  6     0.84502    1.00000
  7     3.07071    1.00000
  8    -2.50289    1.00000
  9     6.38705    0.94683
 10     0.63394    1.00000
Obs        r         _w2_

  1    -2.28228    1.00000
  2     0.53287    1.00000
  3     5.83868    1.00000
  4     0.17892    1.00000
  5    -4.02932    1.00000
  6     0.85736    1.00000
  7     3.06771    1.00000
  8    -2.51505    1.00000
  9     6.35958    0.95076
 10     0.62809    1.00000
Obs        r         _w2_

  1    -2.28754    1.00000
  2     0.52337    1.00000
  3     5.82983    1.00000
  4     0.16428    1.00000
  5    -4.02981    1.00000
  6     0.86264    1.00000
  7     3.06644    1.00000
  8    -2.52021    1.00000
  9     6.34783    0.95245
 10     0.62559    1.00000
Obs        r         _w2_

  1    -2.28978    0.97649
  2     0.51931    0.99878
  3     5.82603    0.85279
  4     0.15802    0.99989
  5    -4.03002    0.92810
  6     0.86489    0.99663
  7     3.06590    0.95806
  8    -2.52242    0.97151
  9     6.34281    0.82680
 10     0.62453    0.99824
Obs        r         _w2_

  1    -2.73012    0.96699
  2     0.86459    0.99666
  3     4.96329    0.89302
  4     0.72986    0.99762
  5    -4.06566    0.92755
  6     1.00401    0.99550
  7     2.37379    0.97500
  8    -4.09078    0.92667
  9     5.80575    0.85515
 10     0.29495    0.99961
Obs        r         _w2_

  1    -2.81014    0.96404
  2     0.83110    0.99683
  3     4.83738    0.89535
  4     0.68947    0.99782
  5    -4.06672    0.92544
  6     1.02494    0.99518
  7     2.29836    0.97587
  8    -4.29038    0.91720
  9     5.68803    0.85684
 10     0.23689    0.99974
Obs        r         _w2_

  1    -2.83138    0.96315
  2     0.80418    0.99700
  3     4.80324    0.89581
  4     0.64927    0.99804
  5    -4.06725    0.92471
  6     1.03939    0.99499
  7     2.28767    0.97586
  8    -4.32471    0.91509
  9     5.64737    0.85748
 10     0.22449    0.99977
Obs        r         _w2_

  1    -2.83871    0.96282
  2     0.79223    0.99708
  3     4.79108    0.89594
  4     0.63096    0.99815
  5    -4.06757    0.92442
  6     1.04606    0.99491
  7     2.28533    0.97582
  8    -4.33367    0.91444
  9     5.63172    0.85773
 10     0.22075    0.99977
Obs        r         _w2_

  1    -2.84151    0.96269
  2     0.78734    0.99711
  3     4.78636    0.89600
  4     0.62341    0.99819
  5    -4.06772    0.92431
  6     1.04886    0.99488
  7     2.28460    0.97580
  8    -4.33669    0.91420
  9     5.62552    0.85783
 10     0.21941    0.99978
Obs        r         _w2_

  1    -2.84261    0.96264
  2     0.78538    0.99712
  3     4.78449    0.89602
  4     0.62036    0.99820
  5    -4.06779    0.92426
  6     1.05000    0.99486
  7     2.28434    0.97579
  8    -4.33783    0.91411
  9     5.62305    0.85788
 10     0.21890    0.99978
Obs        r         _w2_

  1    -2.84305    0.96262
  2     0.78459    0.99713
  3     4.78374    0.89602
  4     0.61914    0.99821
  5    -4.06782    0.92425
  6     1.05046    0.99486
  7     2.28423    0.97579
  8    -4.33828    0.91407
  9     5.62206    0.85789
 10     0.21869    0.99978
 
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency

Weight: _w2_

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3     3786.64784     1262.21595      97.16    <.0001
Error                    35      454.69842       12.99138
Corrected Total          38     4241.34626

Root MSE              3.60436    R-Square     0.8928
Dependent Mean      262.64784    Adj R-Sq     0.8836
Coeff Var             1.37232

                                  Parameter Estimates

                                         Parameter       Standard
Variable     Label               DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept            1      208.75986       15.50570      13.46      <.0001
x2           Homelib              1        0.81146        0.12370       6.56      <.0001
x3           Reading              1        0.09235        0.19497       0.47      0.6387
x4           TV Watching          1       -1.13056        0.19267      -5.87      <.0001
Fig. 10.11, p. 443.
proc reg data = ch10tab11;
  model y = x1-x5/ selection = rsquare best = 2 cp adjrsq ;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y

R-Square Selection Method

Number in                Adjusted
  Model      R-Square    R-Square        C(p)    Variables in Model

       1       0.7630      0.7567     21.9929    x4
       1       0.5551      0.5434     72.8418    x2
-------------------------------------------------------------------
       2       0.8422      0.8337      4.6039    x2 x4
       2       0.7923      0.7810     16.8260    x1 x2
-------------------------------------------------------------------
       3       0.8514      0.8390      4.3538    x2 x3 x4
       3       0.8507      0.8383      4.5237    x1 x2 x4
-------------------------------------------------------------------
       4       0.8589      0.8427      4.5321    x1 x2 x3 x4
       4       0.8536      0.8369      5.8078    x1 x2 x4 x5
-------------------------------------------------------------------
       5       0.8610      0.8406      6.0000    x1 x2 x3 x4 x5
The model fitted by OLS (10.63), p. 443.
proc reg data = ch10tab11;
 model y = x2 x3 x4;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3     5781.00353     1927.00118      68.76    <.0001
Error                    36     1008.89647       28.02490
Corrected Total          39     6789.90000

Root MSE              5.29386    R-Square     0.8514
Dependent Mean      260.95000    Adj R-Sq     0.8390
Coeff Var             2.02869

                                  Parameter Estimates

                                         Parameter       Standard
Variable     Label               DF       Estimate          Error    t Value    Pr > |t|

Intercept    Intercept            1      199.61074       21.52892       9.27      <.0001
x2           Homelib              1        0.78043        0.17020       4.59      <.0001
x3           Reading              1        0.40118        0.26876       1.49      0.1442
x4           TV Watching          1       -1.15647        0.27140      -4.26      0.0001

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California