|
|
|
||||
|
|
|||||
Inputting Blood Pressure data (page 407, Table 10.1).
data ch10tab01;
input x y;
label x='age'
y='Dbp';
cards;
27 73
21 66
22 63
24 75
25 71
23 70
20 65
20 70
29 79
24 72
25 68
28 67
26 79
38 91
32 76
33 69
31 66
34 73
37 78
38 87
33 76
35 79
30 73
31 80
37 68
39 75
46 89
49 101
40 70
42 72
43 80
46 83
43 75
44 71
46 80
47 96
45 92
49 80
48 70
40 90
42 85
55 76
54 71
57 99
52 86
53 79
56 92
52 85
50 71
59 90
50 91
52 100
58 80
57 109
;;;
run;
Three Diagnostic Plots Fig. 10.1a and 10.1b, p. 406.
proc reg data=ch10tab01; model y = x; output out=temp r=residual; plot y*x r.*x; run; quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Dbp
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 2374.96833 2374.96833 35.79 <.0001
Error 52 3450.36501 66.35317
Corrected Total 53 5825.33333
Root MSE 8.14575 R-Square 0.4077
Dependent Mean 79.11111 Adj R-Sq 0.3963
Coeff Var 10.29659
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 56.15693 3.99367 14.06 <.0001
x age 1 0.58003 0.09695 5.98 <.0001
Fig. 10.1c, p. 406.
data temp; set temp; absr = abs(residual); run; symbol1 v=star h=.8; axis1 order=(0 to 20 by 5); proc gplot data = temp; plot absr*x/ vaxis = axis1; run; quit;
Regressing the absolute residuals against X, formula 10.19 page 406.
proc reg data = temp ; model absr = x; output out = temp1 p = s ; run; quit;
The REG Procedure
Model: MODEL1
Dependent Variable: absr
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 277.23091 277.23091 13.93 0.0005
Error 52 1034.62880 19.89671
Corrected Total 53 1311.85971
Root MSE 4.46057 R-Square 0.2113
Dependent Mean 6.29301 Adj R-Sq 0.1962
Coeff Var 70.88141
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 -1.54948 2.18692 -0.71 0.4818
x age 1 0.19817 0.05309 3.73 0.0005
Obtaining the weights, w = 1/(s^2).
Table 10.1, p. 407.
data temp1; set temp1; w = 1/(s**2); run; proc print data = temp1 (obs = 10); run;
Obs x y residual absr s w 1 27 73 1.18224 1.18224 3.80117 0.06921 2 21 66 -2.33758 2.33758 2.61214 0.14656 3 22 63 -5.91761 5.91761 2.81031 0.12662 4 24 75 4.92233 4.92233 3.20666 0.09725 5 25 71 0.34230 0.34230 3.40483 0.08626 6 23 70 0.50236 0.50236 3.00849 0.11049 7 20 65 -2.75755 2.75755 2.41397 0.17161 8 20 70 2.24245 2.24245 2.41397 0.17161 9 29 79 6.02218 6.02218 4.19752 0.05676 10 24 72 1.92233 1.92233 3.20666 0.09725
The equation (10.20) by using WLS regression. The option clb in the model statement supplies the confidence interval for the parameters.
proc reg data = temp1; weight w; model y = x / clb; run; quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Dbp
Weight: w
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 83.34082 83.34082 56.64 <.0001
Error 52 76.51351 1.47141
Corrected Total 53 159.85432
Root MSE 1.21302 R-Square 0.5214
Dependent Mean 73.55134 Adj R-Sq 0.5122
Coeff Var 1.64921
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t| 95% Confidence Limits
Intercept Intercept 1 55.56577 2.52092 22.04 <.0001 50.50718 60.62436
x age 1 0.59634 0.07924 7.53 <.0001 0.43734 0.75534
Inputting data for Ridge Regression example, p. 413.
data ch7tab01;
input X1 X2 X3 Y;
label x1 = 'Triceps'
x2 = 'Thigh cir.'
x3 = 'Midarm cir.'
y = 'body fat';
cards;
19.5 43.1 29.1 11.9
24.7 49.8 28.2 22.8
30.7 51.9 37.0 18.7
29.8 54.3 31.1 20.1
19.1 42.2 30.9 12.9
25.6 53.9 23.7 21.7
31.4 58.5 27.6 27.1
27.9 52.1 30.6 25.4
22.1 49.9 23.2 21.3
25.5 53.5 24.8 19.3
31.1 56.6 30.0 25.4
30.4 56.7 28.3 27.2
18.7 46.5 23.0 11.7
19.7 44.2 28.6 17.8
14.6 42.7 21.3 12.8
29.5 54.4 30.1 23.9
27.7 55.3 25.7 22.6
30.2 58.6 24.6 25.4
22.7 48.2 27.1 14.8
25.2 51.0 27.5 21.1
;
run;
Transforming the variables using the correlation transformation (7.44).
proc sql;
create table ch7tab1a as
select *, ( y - mean(y) )/( std(y)*( sqrt( count(y)-1 ) ) ) as ty,
( x1 - mean(x1) )/( std(x1)*( sqrt( count(x1)-1 ) ) ) as tx1,
( x2 - mean(x2) )/( std(x2)*( sqrt( count(x2)-1 ) ) ) as tx2,
( x3 - mean(x3) )/( std(x3)*( sqrt( count(x3)-1 ) ) ) as tx3
from ch7tab01;
quit;
Ridge Regression on Body fat data.
Fig. 10.3, p. 413.
symbol1 v=dot h=.8; proc reg data = ch7tab1a outest = temp outstb noprint; model y = x1-x3/ ridge = (0.001 to 0.1 by .001) outvif ; plot / ridgeplot vref=0; run; quit;
The equations at the bottom of p. 413. The first line are the coefficients for the original variables and the second line are the coefficients for the transformed variables. The transformation shown in (7.44) is done automatically by SAS so there is no need to manually transform the variables yourself. Notice that we used the untransformed variables in the regression models!
proc reg data = ch7tab1a outest = temp outstb noprint; model y = x1-x3 / ridge = 0.02; run; quit; proc print data = temp; where _ridge_ = 0.02 and y = -1; var y intercept x1 x2 x3; run;
Obs Y Intercept X1 X2 X3 2 -1 -7.40343 0.55535 0.36814 -0.19163 3 -1 0.00000 0.54633 0.37740 -0.13687
Table 10.2, p. 414.
The outstb option in the proc statement tells SAS to put the parameter estimates in the output temp1. These can then be chosen by specifying RIDGESTB in the where statement of the proc print.
proc reg data = ch7tab1a outest = temp outstb outvif; model y = x1-x3/ridge = (0.0 to 0.01 by 0.002 0.02 to 0.05 by 0.01 0.5 1.0); run;quit; proc print data = temp; where _type_ = 'RIDGESTB'; var _ridge_ x1 x2 x3; run;
The REG Procedure
Model: MODEL1
Dependent Variable: Y body fat
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 396.98461 132.32820 21.52 <.0001
Error 16 98.40489 6.15031
Corrected Total 19 495.38950
Root MSE 2.47998 R-Square 0.8014
Dependent Mean 20.19500 Adj R-Sq 0.7641
Coeff Var 12.28017
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 117.08469 99.78240 1.17 0.2578
X1 Triceps 1 4.33409 3.01551 1.44 0.1699
X2 Thigh cir. 1 -2.85685 2.58202 -1.11 0.2849
X3 Midarm cir. 1 -2.18606 1.59550 -1.37 0.1896
Obs _RIDGE_ X1 X2 X3
4 0.000 4.26370 -2.92870 -1.56142
7 0.002 1.44066 -0.41129 -0.48127
10 0.004 1.00632 -0.02484 -0.31487
13 0.006 0.83002 0.13142 -0.24716
16 0.008 0.73433 0.21576 -0.21030
19 0.010 0.67417 0.26841 -0.18703
22 0.020 0.54633 0.37740 -0.13687
25 0.030 0.50038 0.41341 -0.11808
28 0.040 0.47600 0.43024 -0.10758
31 0.050 0.46046 0.43924 -0.10051
34 0.500 0.33772 0.37906 -0.02950
37 1.000 0.27977 0.31007 -0.00594
Table 10.3, p. 414.
The outvif option in the proc statement of the regression tells SAS to put the VIF's in the output temp1. These can then be chosen by specifying RIDGEVIF in the where statement of the proc print.
proc print data = temp; where _type_ = 'RIDGEVIF'; var _ridge_ x1 x2 x3; run;
Obs _RIDGE_ X1 X2 X3 2 0.000 708.843 564.343 104.606 5 0.002 50.559 40.448 8.280 8 0.004 16.982 13.725 3.363 11 0.006 8.503 6.976 2.119 14 0.008 5.147 4.305 1.624 17 0.010 3.486 2.981 1.377 20 0.020 1.103 1.081 1.011 23 0.030 0.626 0.697 0.923 26 0.040 0.453 0.555 0.881 29 0.050 0.370 0.486 0.853 32 0.500 0.154 0.214 0.403 35 1.000 0.107 0.136 0.227
Inputting the Mathematics Proficiency Data, Table 10.4, p. 421.
Note: The easiest method of including observations with multiple words in one string variable is to connect the words with an underscore.
data ch10tab11;
input state $ y x1 x2 x3 x4 x5;
label y = 'Math profeciency'
x1 = 'Parents'
x2 = 'Homelib'
x3 = 'Reading'
x4 = 'TV Watching'
x5 = 'Absences';
cards;
Alabama 252 75 78 34 18 18
Arizona 259 75 73 41 12 26
Arkansas 256 77 77 28 20 23
California 256 78 68 42 11 28
Colorado 267 78 85 38 9 25
Connecticut 270 79 86 43 12 22
Delaware 261 75 83 32 18 28
Distric_of_Columbia 231 47 76 24 33 37
Florida 255 75 73 31 19 27
Georgia 258 73 80 36 17 22
Guam 231 81 64 32 20 28
Hawaii 251 78 69 36 23 26
Idaho 272 84 84 48 7 21
Illinois 260 78 82 43 14 21
Indiana 267 81 84 37 11 23
Iowa 278 83 88 43 8 20
Kentucky 256 79 78 36 14 23
Louisiana 246 73 76 36 19 27
Maryland 260 75 83 34 19 27
Michigan 264 77 84 31 14 25
Minnesota 276 83 88 36 7 20
Montana 280 83 88 44 6 21
Nebraska 276 85 88 42 9 19
New_Hampshire 273 83 88 40 7 22
New_Jersey 269 79 84 41 13 23
New_Mexico 256 77 72 40 11 27
New_York 261 76 79 35 17 29
North_Carolina 250 74 78 37 21 25
North_Dakota 281 85 90 41 6 14
Ohio 264 79 84 36 11 22
Oklahoma 263 78 78 37 14 22
Oregon 271 81 82 41 9 31
Pennsylvania 266 80 86 34 10 24
Rhode_Island 260 78 80 38 12 28
Texas 258 77 70 34 15 18
Virgin_Islands 218 63 76 23 27 22
Virginia 264 78 82 33 16 24
West_Virginia 256 82 80 36 16 25
Wisconsin 274 81 86 38 8 21
Wyoming 272 85 86 43 7 23
;
run;
Fig. 10.5, p. 421.
proc reg data = ch10tab11; model y = x2; plot y*x2 r.*x2; run; quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 3769.30965 3769.30965 47.42 <.0001
Error 38 3020.59035 79.48922
Corrected Total 39 6789.90000
Root MSE 8.91567 R-Square 0.5551
Dependent Mean 260.95000 Adj R-Sq 0.5434
Coeff Var 3.41662
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 135.55589 18.26408 7.42 <.0001
x2 Homelib 1 1.55963 0.22649 6.89 <.0001
The model using robust regression. Invoking the macro robust_hubert which in turn invokes the mad macro. It will create two pictures but this can be modified. We show the Predicted by Residual below.
%include 'c:\neter\mad.sas'; %include 'c:\neter\robust_hubert.sas'; %robust_hubert(ch10tab11, y, x2, 0.000005, 8);
Below you can see the results of the OLS fit (10.49, page 422) followed by the iterations of the reweighted least squares (see table 10.5, page 422) and then the final results (see 10.51, page 423).
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 3769.30965 3769.30965 47.42 <.0001
Error 38 3020.59035 79.48922
Corrected Total 39 6789.90000
Root MSE 8.91567 R-Square 0.5551
Dependent Mean 260.95000 Adj R-Sq 0.5434
Coeff Var 3.41662
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 135.55589 18.26408 7.42 <.0001
x2 Homelib 1 1.55963 0.22649 6.89 <.0001
Obs r u _w2_
1 -5.2069 -0.77336 1.00000
2 9.5912 1.42455 0.94416
3 0.3527 0.05239 1.00000
4 14.3894 2.13719 0.62933
5 -1.1243 -0.16699 1.00000
6 0.3161 0.04695 1.00000
7 -4.0050 -0.59485 1.00000
8 -23.0876 -3.42911 0.39223
9 5.5912 0.83044 1.00000
10 -2.3261 -0.34549 1.00000
Obs r _w2_
1 -6.1773 1.00000
2 8.3454 1.00000
3 -0.6728 1.00000
4 12.8681 0.70085
5 -1.7091 1.00000
6 -0.2136 1.00000
7 -4.7000 1.00000
8 -24.1682 0.37316
9 4.3454 1.00000
10 -3.1864 1.00000
Obs r _w2_
1 -6.3179 1.00000
2 8.1025 1.00000
3 -0.8338 1.00000
4 12.5230 0.70388
5 -1.7065 1.00000
6 -0.1906 1.00000
7 -4.7384 1.00000
8 -24.3498 0.36200
9 4.1025 1.00000
10 -3.2861 1.00000
Obs r _w2_
1 -6.3529 1.00000
2 8.0501 1.00000
3 -0.8723 1.00000
4 12.4531 0.70504
5 -1.7171 1.00000
6 -0.1977 1.00000
7 -4.7559 1.00000
8 -24.3917 0.35995
9 4.0501 1.00000
10 -3.3141 1.00000
Obs r _w2_
1 -6.3602 1.00000
2 8.0389 1.00000
3 -0.8804 1.00000
4 12.4380 0.70527
5 -1.7190 1.00000
6 -0.1988 1.00000
7 -4.7593 1.00000
8 -24.4006 0.35951
9 4.0389 1.00000
10 -3.3198 1.00000
Obs r _w2_
1 -6.3618 1.00000
2 8.0365 1.00000
3 -0.8821 1.00000
4 12.4348 0.70532
5 -1.7194 1.00000
6 -0.1990 1.00000
7 -4.7600 1.00000
8 -24.4025 0.35941
9 4.0365 1.00000
10 -3.3211 1.00000
Obs r _w2_
1 -6.3621 1.00000
2 8.0360 1.00000
3 -0.8825 1.00000
4 12.4341 0.70533
5 -1.7194 1.00000
6 -0.1991 1.00000
7 -4.7602 1.00000
8 -24.4029 0.35939
9 4.0360 1.00000
10 -3.3213 1.00000
Obs r _w2_
1 -6.3622 1.00000
2 8.0359 1.00000
3 -0.8826 1.00000
4 12.4340 0.70533
5 -1.7195 1.00000
6 -0.1991 1.00000
7 -4.7602 1.00000
8 -24.4029 0.35939
9 4.0359 1.00000
10 -3.3214 1.00000
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency
Weight: _w2_
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 3165.87899 3165.87899 78.49 <.0001
Error 38 1532.63864 40.33260
Corrected Total 39 4698.51763
Root MSE 6.35079 R-Square 0.6738
Dependent Mean 262.40346 Adj R-Sq 0.6652
Coeff Var 2.42024
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 142.95244 13.52182 10.57 <.0001
x2 Homelib 1 1.47961 0.16700 8.86 <.0001
Sections 10.4 and 10.5 were skipped. For help using Loess method please come see us in consulting and for Bootstrapping you might consider using the bs command in Stata, for example http://www.ats.ucla.edu/stat/stata/examples/ara/arastata16.htm.
Section 10.6--Model Validation!
Inputting the Surgical Unit data, Table 8.1, p. 335.
data ch8tab01;
input x1 x2 x3 x4 y logy;
label x1 = 'blood-clotting'
x2 = 'prognostic'
x3 = 'enzyme'
x4 = 'liver function'
y = 'survival'
logy = 'Logsurvival';
cards;
6.7 62 81 2.59 200 2.3010
5.1 59 66 1.70 101 2.0043
7.4 57 83 2.16 204 2.3096
6.5 73 41 2.01 101 2.0043
7.8 65 115 4.30 509 2.7067
5.8 38 72 1.42 80 1.9031
5.7 46 63 1.91 80 1.9031
3.7 68 81 2.57 127 2.1038
6.0 67 93 2.50 202 2.3054
3.7 76 94 2.40 203 2.3075
6.3 84 83 4.13 329 2.5172
6.7 51 43 1.86 65 1.8129
5.8 96 114 3.95 830 2.9191
5.8 83 88 3.95 330 2.5185
7.7 62 67 3.40 168 2.2253
7.4 74 68 2.40 217 2.3365
6.0 85 28 2.98 87 1.9395
3.7 51 41 1.55 34 1.5315
7.3 68 74 3.56 215 2.3324
5.6 57 87 3.02 172 2.2355
5.2 52 76 2.85 109 2.0374
3.4 83 53 1.12 136 2.1335
6.7 26 68 2.10 70 1.8451
5.8 67 86 3.40 220 2.3424
6.3 59 100 2.95 276 2.4409
5.8 61 73 3.50 144 2.1584
5.2 52 86 2.45 181 2.2577
11.2 76 90 5.59 574 2.7589
5.2 54 56 2.71 72 1.8573
5.8 76 59 2.58 178 2.2504
3.2 64 65 0.74 71 1.8513
8.7 45 23 2.52 58 1.7634
5.0 59 73 3.50 116 2.0645
5.8 72 93 3.30 295 2.4698
5.4 58 70 2.64 115 2.0607
5.3 51 99 2.60 184 2.2648
2.6 74 86 2.05 118 2.0719
4.3 8 119 2.85 120 2.0792
4.8 61 76 2.45 151 2.1790
5.4 52 88 1.81 148 2.1703
5.2 49 72 1.84 95 1.9777
3.6 28 99 1.30 75 1.8751
8.8 86 88 6.40 483 2.6840
6.5 56 77 2.85 153 2.1847
3.4 77 93 1.48 191 2.2810
6.5 40 84 3.00 123 2.0899
4.5 73 106 3.05 311 2.4928
4.8 86 101 4.10 398 2.5999
5.1 67 77 2.86 158 2.1987
3.9 82 103 4.55 310 2.4914
6.6 77 46 1.95 124 2.0934
6.4 85 40 1.21 125 2.0969
6.4 59 85 2.33 198 2.2967
8.8 78 72 3.20 313 2.4955
;
run;
Inputting the Validation dataset, Table 10.10, p. 439.
data ch10tab10;
input x1 x2 x3 x4 logy;
label x1 = 'Clotting'
x2 = 'Prognostic'
x3 = 'Enzyme'
x4 = 'Liver'
logy = 'logSurvival';
cards;
7.1 23 78 1.93 2.0326
4.9 66 91 3.05 2.4086
6.4 90 35 1.06 2.2177
5.7 35 70 2.13 1.9078
6.1 42 69 2.25 2.0035
8.0 27 83 2.03 2.0945
6.8 34 51 1.27 1.7652
4.7 63 36 1.71 1.7925
7.0 47 67 1.60 2.1292
6.7 69 65 2.91 2.2295
6.7 46 78 3.26 2.1524
5.8 60 86 3.11 2.3188
6.7 56 32 1.53 1.9039
6.8 51 58 2.18 2.0508
7.2 95 82 4.68 2.6525
7.4 52 67 3.28 2.2053
5.3 53 62 2.42 1.9246
3.5 58 84 1.74 2.1541
6.8 74 79 2.25 2.4970
4.4 47 49 2.42 1.7237
7.0 66 118 4.69 2.8339
6.7 61 57 3.87 2.1282
5.6 75 103 3.11 2.6884
6.9 58 88 3.46 2.4284
6.2 62 57 1.25 2.0261
4.7 97 27 1.77 2.0843
6.8 69 60 2.90 2.2826
6.0 73 58 1.22 2.2073
5.9 50 62 3.19 2.0443
5.5 88 74 3.21 2.4863
3.8 55 52 1.41 1.9037
4.3 99 83 3.93 2.6647
6.6 48 54 2.94 1.9071
6.2 42 63 1.85 1.9093
5.0 60 105 3.17 2.4389
5.8 62 82 3.18 2.3343
4.7 42 10 0.28 1.3379
5.7 70 59 2.28 2.1996
4.7 64 48 1.30 1.8795
7.8 74 40 2.58 2.1504
2.9 43 32 0.94 1.4330
4.9 72 90 3.51 2.4381
4.6 73 57 2.82 2.1075
5.9 78 70 4.28 2.2843
4.6 69 70 3.17 2.1615
6.1 53 52 1.84 2.0558
5.9 88 98 3.33 2.7249
4.7 66 68 1.80 2.0520
10.4 62 85 4.65 2.6810
5.8 70 64 2.52 2.2604
5.4 64 81 1.36 2.2553
6.9 90 33 2.78 2.1745
7.9 45 55 2.46 2.0224
4.5 68 60 2.07 2.1413
;;;
run;
Table 10.9, p. 438.
proc reg data = ch8tab01 outest = temp; title 'Results from the Model Building Data set'; model logy = x1 x2 x3/press; run; quit; proc print data = temp; var _press_; run; proc reg data = ch10tab10 outest = temp; title 'Results from the Validation Data set'; model logy = x1 x2 x3/ press; run; quit; proc print data = temp; var _press_; run; title ;
The REG Procedure
Model: MODEL1
Dependent Variable: logy Logsurvival
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 3.86291 1.28764 586.04 <.0001
Error 50 0.10986 0.00220
Corrected Total 53 3.97277
Root MSE 0.04687 R-Square 0.9723
Dependent Mean 2.20614 Adj R-Sq 0.9707
Coeff Var 2.12470
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 0.48362 0.04263 11.34 <.0001
x1 blood-clotting 1 0.06923 0.00408 16.98 <.0001
x2 prognostic 1 0.00929 0.00038250 24.30 <.0001
x3 enzyme 1 0.00952 0.00030641 31.08 <.0001
Obs _PRESS_ 1 0.14045
The REG Procedure
Model: MODEL1
Dependent Variable: logy logSurvival
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 4.62507 1.54169 730.29 <.0001
Error 50 0.10555 0.00211
Corrected Total 53 4.73062
Root MSE 0.04595 R-Square 0.9777
Dependent Mean 2.16466 Adj R-Sq 0.9763
Coeff Var 2.12257
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 0.50082 0.04192 11.95 <.0001
x1 Clotting 1 0.06741 0.00498 13.53 <.0001
x2 Prognostic 1 0.01011 0.00037193 27.18 <.0001
x3 Enzyme 1 0.00974 0.00030225 32.22 <.0001
Obs _PRESS_ 1 0.12125
Case Example--Mathematical Proficiency.
Note: This data has already been input in this program.
Fig. 10.10a, p. 441.
Calling the scatter matrix macro.
%include 'c:\neter\scatter.sas'; %scatter(data = ch10tab11, var= y x1 x2 x3 x4 x5);
<The scatterplot is not shown>
Fig. 10.10b, p. 441.
proc corr data = ch10tab11; var y x1-x5; run;
he CORR Procedure
6 Variables: y x1 x2 x3 x4 x5
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum Label
y 40 260.95000 13.19470 10438 218.00000 281.00000 Math profeciency
x1 40 77.70000 6.49339 3108 47.00000 85.00000 Parents
x2 40 80.40000 6.30344 3216 64.00000 90.00000 Homelib
x3 40 36.85000 5.26016 1474 23.00000 48.00000 Reading
x4 40 14.00000 5.99572 560.00000 6.00000 33.00000 TV Watching
x5 40 23.92500 4.07863 957.00000 14.00000 37.00000 Absences
Pearson Correlation Coefficients, N = 40
Prob > |r| under H0: Rho=0
y x1 x2 x3 x4 x5
y 1.00000 0.74141 0.74507 0.71659 -0.87348 -0.48034
Math profeciency <.0001 <.0001 <.0001 <.0001 0.0017
x1 0.74141 1.00000 0.39454 0.69304 -0.83115 -0.56531
Parents <.0001 0.0118 <.0001 <.0001 0.0001
x2 0.74507 0.39454 1.00000 0.37692 -0.59364 -0.44262
Homelib <.0001 0.0118 0.0165 <.0001 0.0042
x3 0.71659 0.69304 0.37692 1.00000 -0.79187 -0.35669
Reading <.0001 <.0001 0.0165 <.0001 0.0239
x4 -0.87348 -0.83115 -0.59364 -0.79187 1.00000 0.51168
TV Watching <.0001 <.0001 <.0001 <.0001 0.0007
x5 -0.48034 -0.56531 -0.44262 -0.35669 0.51168 1.00000
Absences 0.0017 0.0001 0.0042 0.0239 0.0007
Fitted model (10.61) p. 441.
proc reg data = ch10tab11; model y = x1-x5; output out = temp h = hii student=ti cookd = Di; run; quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 5 5846.32774 1169.26555 42.13 <.0001
Error 34 943.57226 27.75213
Corrected Total 39 6789.90000
Root MSE 5.26803 R-Square 0.8610
Dependent Mean 260.95000 Adj R-Sq 0.8406
Coeff Var 2.01879
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 155.03039 36.23830 4.28 0.0001
x1 Parents 1 0.39115 0.25709 1.52 0.1374
x2 Homelib 1 0.86387 0.17971 4.81 <.0001
x3 Reading 1 0.36162 0.26896 1.34 0.1877
x4 TV Watching 1 -0.84672 0.35254 -2.40 0.0219
x5 Absences 1 0.19229 0.26361 0.73 0.4707
Table 10.12, p. 442.
proc print data = temp (obs = 10); var hii ti Di; run;
Obs hii ti Di 1 0.16014 -0.05464 0.00009 2 0.18531 0.40076 0.00609 3 0.16201 1.39338 0.06256 4 0.29069 0.10337 0.00073 5 0.09541 -0.57826 0.00588 6 0.12133 0.03171 0.00002 7 0.11685 0.64985 0.00931 8 0.69026 1.39145 0.71914 9 0.09109 1.44485 0.03487 10 0.07670 0.48432 0.00325
The fitted model using robust regression (10.62), p. 442.
Running the Hubert robust regression.
Running the Hubert/Biweight robust regression which is similar to rreg in Stata.
Invoking two different macros.
Here is the first macro, robust_hubert.
%include 'c:\neter\mad.sas'; %include 'c:\neter\robust_hubert.sas'; %robust_hubert(ch10tab11, y, x2 x3 x4, 0.0005, 9);
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 5781.00353 1927.00118 68.76 <.0001
Error 36 1008.89647 28.02490
Corrected Total 39 6789.90000
Root MSE 5.29386 R-Square 0.8514
Dependent Mean 260.95000 Adj R-Sq 0.8390
Coeff Var 2.02869
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 199.61074 21.52892 9.27 <.0001
x2 Homelib 1 0.78043 0.17020 4.59 <.0001
x3 Reading 1 0.40118 0.26876 1.49 0.1442
x4 TV Watching 1 -1.15647 0.27140 -4.26 0.0001
Obs r u _w2_
1 -1.30771 -0.28443 1.00000
2 -0.15269 -0.03321 1.00000
3 8.19275 1.78192 0.75481
4 -0.80821 -0.17578 1.00000
5 -3.78369 -0.82295 1.00000
6 -0.10060 -0.02188 1.00000
7 4.59251 0.99887 1.00000
8 0.61205 0.13312 1.00000
9 7.95444 1.73008 0.77742
10 1.17259 0.25504 1.00000
Obs r _w2_
1 -1.97842 1.00000
2 0.70596 1.00000
3 6.47543 0.92592
4 0.45509 1.00000
5 -3.93974 1.00000
6 0.55384 1.00000
7 3.35113 1.00000
8 -1.92494 1.00000
9 6.95415 0.86218
10 0.78291 1.00000
Obs r _w2_
1 -2.16585 1.00000
2 0.68042 1.00000
3 6.05663 0.99024
4 0.40731 1.00000
5 -4.00323 1.00000
6 0.74252 1.00000
7 3.13259 1.00000
8 -2.35721 1.00000
9 6.60526 0.90799
10 0.68544 1.00000
Obs r _w2_
1 -2.24010 1.00000
2 0.60703 1.00000
3 5.90929 1.00000
4 0.29252 1.00000
5 -4.02355 1.00000
6 0.81795 1.00000
7 3.07963 1.00000
8 -2.47327 1.00000
9 6.45193 0.93764
10 0.64893 1.00000
Obs r _w2_
1 -2.26999 1.00000
2 0.55502 1.00000
3 5.85944 1.00000
4 0.21310 1.00000
5 -4.02818 1.00000
6 0.84502 1.00000
7 3.07071 1.00000
8 -2.50289 1.00000
9 6.38705 0.94683
10 0.63394 1.00000
Obs r _w2_
1 -2.28228 1.00000
2 0.53287 1.00000
3 5.83868 1.00000
4 0.17892 1.00000
5 -4.02932 1.00000
6 0.85736 1.00000
7 3.06771 1.00000
8 -2.51505 1.00000
9 6.35958 0.95076
10 0.62809 1.00000
Obs r _w2_
1 -2.28754 1.00000
2 0.52337 1.00000
3 5.82983 1.00000
4 0.16428 1.00000
5 -4.02981 1.00000
6 0.86264 1.00000
7 3.06644 1.00000
8 -2.52021 1.00000
9 6.34783 0.95245
10 0.62559 1.00000
Obs r _w2_
1 -2.28978 1.00000
2 0.51931 1.00000
3 5.82603 1.00000
4 0.15802 1.00000
5 -4.03002 1.00000
6 0.86489 1.00000
7 3.06590 1.00000
8 -2.52242 1.00000
9 6.34281 0.95317
10 0.62453 1.00000
Obs r _w2_
1 -2.29074 1.00000
2 0.51758 1.00000
3 5.82441 1.00000
4 0.15534 1.00000
5 -4.03011 1.00000
6 0.86586 1.00000
7 3.06567 1.00000
8 -2.52337 1.00000
9 6.34066 0.95348
10 0.62407 1.00000
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency
Weight: _w2_
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 4391.06625 1463.68875 83.27 <.0001
Error 36 632.76457 17.57679
Corrected Total 39 5023.83082
Root MSE 4.19247 R-Square 0.8740
Dependent Mean 262.15459 Adj R-Sq 0.8636
Coeff Var 1.59924
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 207.83984 17.58882 11.82 <.0001
x2 Homelib 1 0.79410 0.14083 5.64 <.0001
x3 Reading 1 0.16362 0.22036 0.74 0.4626
x4 TV Watching 1 -1.16953 0.21890 -5.34 <.0001
Here is the second macro, robust_hb.
%include 'c:\neter\robust_hb.sas'; %robust_hb(ch10tab11, y, x2 x3 x4, 0.01, 0.0005, 9);
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 5781.00353 1927.00118 68.76 <.0001
Error 36 1008.89647 28.02490
Corrected Total 39 6789.90000
Root MSE 5.29386 R-Square 0.8514
Dependent Mean 260.95000 Adj R-Sq 0.8390
Coeff Var 2.02869
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 199.61074 21.52892 9.27 <.0001
x2 Homelib 1 0.78043 0.17020 4.59 <.0001
x3 Reading 1 0.40118 0.26876 1.49 0.1442
x4 TV Watching 1 -1.15647 0.27140 -4.26 0.0001
Obs r u _w2_
1 -1.30771 -0.28443 1.00000
2 -0.15269 -0.03321 1.00000
3 8.19275 1.78192 0.75481
4 -0.80821 -0.17578 1.00000
5 -3.78369 -0.82295 1.00000
6 -0.10060 -0.02188 1.00000
7 4.59251 0.99887 1.00000
8 0.61205 0.13312 1.00000
9 7.95444 1.73008 0.77742
10 1.17259 0.25504 1.00000
Obs r _w2_
1 -1.97842 1.00000
2 0.70596 1.00000
3 6.47543 0.92592
4 0.45509 1.00000
5 -3.93974 1.00000
6 0.55384 1.00000
7 3.35113 1.00000
8 -1.92494 1.00000
9 6.95415 0.86218
10 0.78291 1.00000
Obs r _w2_
1 -2.16585 1.00000
2 0.68042 1.00000
3 6.05663 0.99024
4 0.40731 1.00000
5 -4.00323 1.00000
6 0.74252 1.00000
7 3.13259 1.00000
8 -2.35721 1.00000
9 6.60526 0.90799
10 0.68544 1.00000
Obs r _w2_
1 -2.24010 1.00000
2 0.60703 1.00000
3 5.90929 1.00000
4 0.29252 1.00000
5 -4.02355 1.00000
6 0.81795 1.00000
7 3.07963 1.00000
8 -2.47327 1.00000
9 6.45193 0.93764
10 0.64893 1.00000
Obs r _w2_
1 -2.26999 1.00000
2 0.55502 1.00000
3 5.85944 1.00000
4 0.21310 1.00000
5 -4.02818 1.00000
6 0.84502 1.00000
7 3.07071 1.00000
8 -2.50289 1.00000
9 6.38705 0.94683
10 0.63394 1.00000
Obs r _w2_
1 -2.28228 1.00000
2 0.53287 1.00000
3 5.83868 1.00000
4 0.17892 1.00000
5 -4.02932 1.00000
6 0.85736 1.00000
7 3.06771 1.00000
8 -2.51505 1.00000
9 6.35958 0.95076
10 0.62809 1.00000
Obs r _w2_
1 -2.28754 1.00000
2 0.52337 1.00000
3 5.82983 1.00000
4 0.16428 1.00000
5 -4.02981 1.00000
6 0.86264 1.00000
7 3.06644 1.00000
8 -2.52021 1.00000
9 6.34783 0.95245
10 0.62559 1.00000
Obs r _w2_
1 -2.28978 0.97649
2 0.51931 0.99878
3 5.82603 0.85279
4 0.15802 0.99989
5 -4.03002 0.92810
6 0.86489 0.99663
7 3.06590 0.95806
8 -2.52242 0.97151
9 6.34281 0.82680
10 0.62453 0.99824
Obs r _w2_
1 -2.73012 0.96699
2 0.86459 0.99666
3 4.96329 0.89302
4 0.72986 0.99762
5 -4.06566 0.92755
6 1.00401 0.99550
7 2.37379 0.97500
8 -4.09078 0.92667
9 5.80575 0.85515
10 0.29495 0.99961
Obs r _w2_
1 -2.81014 0.96404
2 0.83110 0.99683
3 4.83738 0.89535
4 0.68947 0.99782
5 -4.06672 0.92544
6 1.02494 0.99518
7 2.29836 0.97587
8 -4.29038 0.91720
9 5.68803 0.85684
10 0.23689 0.99974
Obs r _w2_
1 -2.83138 0.96315
2 0.80418 0.99700
3 4.80324 0.89581
4 0.64927 0.99804
5 -4.06725 0.92471
6 1.03939 0.99499
7 2.28767 0.97586
8 -4.32471 0.91509
9 5.64737 0.85748
10 0.22449 0.99977
Obs r _w2_
1 -2.83871 0.96282
2 0.79223 0.99708
3 4.79108 0.89594
4 0.63096 0.99815
5 -4.06757 0.92442
6 1.04606 0.99491
7 2.28533 0.97582
8 -4.33367 0.91444
9 5.63172 0.85773
10 0.22075 0.99977
Obs r _w2_
1 -2.84151 0.96269
2 0.78734 0.99711
3 4.78636 0.89600
4 0.62341 0.99819
5 -4.06772 0.92431
6 1.04886 0.99488
7 2.28460 0.97580
8 -4.33669 0.91420
9 5.62552 0.85783
10 0.21941 0.99978
Obs r _w2_
1 -2.84261 0.96264
2 0.78538 0.99712
3 4.78449 0.89602
4 0.62036 0.99820
5 -4.06779 0.92426
6 1.05000 0.99486
7 2.28434 0.97579
8 -4.33783 0.91411
9 5.62305 0.85788
10 0.21890 0.99978
Obs r _w2_
1 -2.84305 0.96262
2 0.78459 0.99713
3 4.78374 0.89602
4 0.61914 0.99821
5 -4.06782 0.92425
6 1.05046 0.99486
7 2.28423 0.97579
8 -4.33828 0.91407
9 5.62206 0.85789
10 0.21869 0.99978
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency
Weight: _w2_
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 3786.64784 1262.21595 97.16 <.0001
Error 35 454.69842 12.99138
Corrected Total 38 4241.34626
Root MSE 3.60436 R-Square 0.8928
Dependent Mean 262.64784 Adj R-Sq 0.8836
Coeff Var 1.37232
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 208.75986 15.50570 13.46 <.0001
x2 Homelib 1 0.81146 0.12370 6.56 <.0001
x3 Reading 1 0.09235 0.19497 0.47 0.6387
x4 TV Watching 1 -1.13056 0.19267 -5.87 <.0001
Fig. 10.11, p. 443.
proc reg data = ch10tab11; model y = x1-x5/ selection = rsquare best = 2 cp adjrsq ; run; quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y
R-Square Selection Method
Number in Adjusted
Model R-Square R-Square C(p) Variables in Model
1 0.7630 0.7567 21.9929 x4
1 0.5551 0.5434 72.8418 x2
-------------------------------------------------------------------
2 0.8422 0.8337 4.6039 x2 x4
2 0.7923 0.7810 16.8260 x1 x2
-------------------------------------------------------------------
3 0.8514 0.8390 4.3538 x2 x3 x4
3 0.8507 0.8383 4.5237 x1 x2 x4
-------------------------------------------------------------------
4 0.8589 0.8427 4.5321 x1 x2 x3 x4
4 0.8536 0.8369 5.8078 x1 x2 x4 x5
-------------------------------------------------------------------
5 0.8610 0.8406 6.0000 x1 x2 x3 x4 x5
The model fitted by OLS (10.63), p. 443.
proc reg data = ch10tab11; model y = x2 x3 x4; run; quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y Math profeciency
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 5781.00353 1927.00118 68.76 <.0001
Error 36 1008.89647 28.02490
Corrected Total 39 6789.90000
Root MSE 5.29386 R-Square 0.8514
Dependent Mean 260.95000 Adj R-Sq 0.8390
Coeff Var 2.02869
Parameter Estimates
Parameter Standard
Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 199.61074 21.52892 9.27 <.0001
x2 Homelib 1 0.78043 0.17020 4.59 <.0001
x3 Reading 1 0.40118 0.26876 1.49 0.1442
x4 TV Watching 1 -1.15647 0.27140 -4.26 0.0001
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services