|
|
|
||||
|
|
|||||
Inputting the Life Insurance data, table 9.1, p. 364.
clear input x1 x2 y 45.010 6 91 57.204 4 162 26.852 5 11 66.290 7 240 40.964 5 73 72.996 10 311 79.380 1 316 52.766 8 154 55.916 6 164 38.122 4 54 35.840 6 53 75.796 9 326 37.408 5 55 54.376 2 130 46.186 7 112 46.130 4 91 30.366 3 14 39.060 5 63 end
First order linear regression of y on x1 x2 (9.3), p. 364.
regress y x1 x2
Source | SS df MS Number of obs = 18
-------------+------------------------------ F( 2, 15) = 542.33
Model | 173919.296 2 86959.6481 Prob > F = 0.0000
Residual | 2405.14824 15 160.343216 R-squared = 0.9864
-------------+------------------------------ Adj R-squared = 0.9845
Total | 176324.444 17 10372.0261 Root MSE = 12.663
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 6.288029 .2041495 30.80 0.000 5.852895 6.723163
x2 | 4.737601 1.37808 3.44 0.004 1.800294 7.674908
_cons | -205.7187 11.39268 -18.06 0.000 -230.0016 -181.4357
------------------------------------------------------------------------------
Regressing both y and x1 on x2 (9.4a and 9.4b), p. 364.
regress y x2
Source | SS df MS Number of obs = 18
-------------+------------------------------ F( 1, 16) = 2.26
Model | 21800.4617 1 21800.4617 Prob > F = 0.1525
Residual | 154523.983 16 9657.74892 R-squared = 0.1236
-------------+------------------------------ Adj R-squared = 0.0689
Total | 176324.444 17 10372.0261 Root MSE = 98.274
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x2 | 15.53969 10.34302 1.50 0.152 -6.386539 37.46592
_cons | 50.70277 60.35893 0.84 0.413 -77.25244 178.658
------------------------------------------------------------------------------
regress x1 x2
Source | SS df MS Number of obs = 18
-------------+------------------------------ F( 1, 16) = 1.11
Model | 266.420425 1 266.420425 Prob > F = 0.3082
Residual | 3847.28124 16 240.455077 R-squared = 0.0648
-------------+------------------------------ Adj R-squared = 0.0063
Total | 4113.70166 17 241.982451 Root MSE = 15.507
------------------------------------------------------------------------------
x1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x2 | 1.717882 1.632024 1.05 0.308 -1.741854 5.177618
_cons | 40.7793 9.524025 4.28 0.001 20.58927 60.96933
------------------------------------------------------------------------------
Fig. 9.3a and 9.3b, p. 365.
Note1: We were not interested in seeing the output of the regression again so we used the quietly option which suppresses the output.
Note2: The avplot command generates the partial regression plot.
quietly regress y x1 x2 rvpplot x1, ylabel(-20(5)25) xlabel(20(10)80)
avplot x1, ylabel(-100(50)250) xlabel(-25(25)50)
Inputting the Body Fat data, table 7.1, p. 261.
Note: We need the clear command to clear out the other dataset since Stata can only have one data set open at one time.
clear input x1 x2 x3 y 19.5 43.1 29.1 11.9 24.7 49.8 28.2 22.8 30.7 51.9 37.0 18.7 29.8 54.3 31.1 20.1 19.1 42.2 30.9 12.9 25.6 53.9 23.7 21.7 31.4 58.5 27.6 27.1 27.9 52.1 30.6 25.4 22.1 49.9 23.2 21.3 25.5 53.5 24.8 19.3 31.1 56.6 30.0 25.4 30.4 56.7 28.3 27.2 18.7 46.5 23.0 11.7 19.7 44.2 28.6 17.8 14.6 42.7 21.3 12.8 29.5 54.4 30.1 23.9 27.7 55.3 25.7 22.6 30.2 58.6 24.6 25.4 22.7 48.2 27.1 14.8 25.2 51.0 27.5 21.1 end
Regressing y on x1 x2, p. 365.
regress y x1 x2
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 2, 17) = 29.80
Model | 385.438738 2 192.719369 Prob > F = 0.0000
Residual | 109.950775 17 6.46769267 R-squared = 0.7781
-------------+------------------------------ Adj R-squared = 0.7519
Total | 495.389513 19 26.0731323 Root MSE = 2.5432
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .2223526 .3034389 0.73 0.474 -.4178475 .8625527
x2 | .6594218 .2911873 2.26 0.037 .0450704 1.273773
_cons | -19.17425 8.36064 -2.29 0.035 -36.81366 -1.534839
------------------------------------------------------------------------------
Fig. 9.4a, p. 366.
rvpplot x1, ylabel(-4(1)5) xlabel(10(5)35)
Fig. 9.4b, p. 366.
avplot x1, ylabel(-6(1)5) xlabel(-5(1)5)
Fig. 9.4c, p. 366.
rvpplot x2, ylabel(-4(1)5) xlabel(40(5)60)
Fig. 9.4d, p. 366.
avplot x2, ylabel(-6(1)5) xlabel(-5(1)5)
Inputting data for illustration of hat matrix, table 9.2, p. 371.
clear input x1 x2 y 14 25 301 19 32 327 12 22 246 11 15 187 end
Regressing y on x1 x2 (9.17), p. 370.
regress y x1 x2
Source | SS df MS Number of obs = 4
-------------+------------------------------ F( 2, 1) = 9.58
Model | 11009.8607 2 5504.93035 Prob > F = 0.2228
Residual | 574.889291 1 574.889291 R-squared = 0.9504
-------------+------------------------------ Adj R-squared = 0.8511
Total | 11584.75 3 3861.58333 Root MSE = 23.977
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | -5.844605 11.74463 -0.50 0.706 -155.0743 143.3851
x2 | 11.32528 5.931139 1.91 0.307 -64.03699 86.68755
_cons | 80.93035 57.94355 1.40 0.396 -655.3123 817.173
------------------------------------------------------------------------------
predict yhat, xb
predict e, resid
predict hat, hat
predict stdp, stdp
gen s2e = (e(rss)/e(df_r) )*(1-hat)
list yhat e hat stdp s2e
yhat e hat stdp s2e
1. 282.2379 18.76208 .3876812 14.92896 352.0155
2. 332.2919 -5.291868 .9512882 23.38558 28.00388
3. 259.9513 -13.95129 .6614332 19.50002 194.6384
4. 186.5189 .4810789 .9995974 23.97202 .231433
Generating the hat matrix and the variance of residuals matrix, table 9.2b and 9.2c, p. 371.
gen constant = 1
mkmat constant x1 x2, matrix(X)
matrix list X
X[4,3]
constant x1 x2
r1 1 14 25
r2 1 19 32
r3 1 12 22
r4 1 11 15
matrix H = X*inv(X'*X)*X'
matrix list H
symmetric H[4,4]
r1 r2 r3 r4
r1 .38768116
r2 .17270531 .95128824
r3 .45531401 -.1284219 .66143317
r4 -.01570048 .00442834 .01167472 .99959742
matrix s2e = (e(rss)/e(df_r) )*(I(4) -H)
matrix list s2e
symmetric s2e[4,4]
r1 r2 r3 r4
r1 352.01554
r2 -99.286436 28.003866
r3 -261.75515 73.828375 194.63844
r4 9.0260396 -2.545806 -6.7116705 .23143691
Returning to the Body Fat data and computing the residuals, the diagonal elements of the hat matrix, and the studentized deleted residuals for the regression of y on x1 x2, table 9.3, p. 375
Note: Since we are not interested in the output from the regression we use the quietly option which suppresses the output.
clear input x1 x2 x3 y 19.5 43.1 29.1 11.9 24.7 49.8 28.2 22.8 30.7 51.9 37.0 18.7 29.8 54.3 31.1 20.1 19.1 42.2 30.9 12.9 25.6 53.9 23.7 21.7 31.4 58.5 27.6 27.1 27.9 52.1 30.6 25.4 22.1 49.9 23.2 21.3 25.5 53.5 24.8 19.3 31.1 56.6 30.0 25.4 30.4 56.7 28.3 27.2 18.7 46.5 23.0 11.7 19.7 44.2 28.6 17.8 14.6 42.7 21.3 12.8 29.5 54.4 30.1 23.9 27.7 55.3 25.7 22.6 30.2 58.6 24.6 25.4 22.7 48.2 27.1 14.8 25.2 51.0 27.5 21.1 end
quietly regress y x1 x2
predict resid, residuals
predict hat, hat
predict student, rstudent
list resid hat student
resid hat student
1. -1.682708 .2010126 -.7299849
2. 3.642931 .0588948 1.534254
3. -3.175971 .3719329 -1.65433
4. -3.158464 .1109401 -1.348484
5. -.0002889 .2480103 -.0001271
6. -.3608158 .1286163 -.1475492
7. .7161994 .1555175 .2981277
8. 4.014733 .0962878 1.760093
9. 2.655104 .1146357 1.117648
10. -2.474812 .1102444 -1.033729
11. .3358067 .1203366 .1366612
12. 2.225511 .1092663 .9231787
13. -3.946861 .1783818 -1.825903
14. 3.447455 .1480068 1.524763
15. .5705876 .333212 .2671503
16. .642297 .0952774 .2581318
17. -.8509458 .1055946 -.3445088
18. -.7829196 .1967927 -.334408
19. -2.857289 .0669542 -1.176171
20. 1.040449 .0500853 .4093566
Identifying potential outliers just by looking at the observations with the largest absolute studentized residuals, p. 374.
list resid hat student if abs(student) > 1.6
resid hat student
3. -3.175971 .3719329 -1.65433
8. 4.014733 .0962878 1.760093
13. -3.946861 .1783818 -1.825903
Using the Bonferroni simultaneous test procedure to determine any of these three really are outliers, alpha = .10, p. 375.
Note: All the potential outliers are nonsignificant when tested using the Bonferroni simultaneous test procedure.
Note: We first demonstrate how to get the correct number for n-p-1 and for 1- alpha/(2*n) before we calculate the critical value using the invttail function. Then we calculate the actual p-value which is .505334 and therefore not significant at the .05 level and we conclude that observation 13 is not an outlier.
display e(df_r) -1
16
display 1 - .1/(2*e(N))
.9975
display invttail(e(df_r) -1 , 1-.1/(2*e(N)) )
-3.2519929
gen p_val = 1- ttail(.1/(2*e(N)) , abs(student))
list student p_val if abs(student) > 1.8
student p_val
13. -1.825903 .505334
Using the data for illustration of hat matrix again, table 9.2, p. 371.
clear input x1 x2 y 14 25 301 19 32 327 12 22 246 11 15 187 end
Calculating the diagonal elements of the hat matrix and generating fig. 9.6, p. 376.
Note: The mlabel(hat) option lets the entries for the hat variable be the points that are plotted.
quietly regress y x1 x2 predict hat, hat replace hat=round(hat, .0001) graph twoway scatter x2 x1, mlabel(hat) ylabel(15 25 35) xlabel(10 15 20)
Back to the Body Fat data.
clear input x1 x2 x3 y 19.5 43.1 29.1 11.9 24.7 49.8 28.2 22.8 30.7 51.9 37.0 18.7 29.8 54.3 31.1 20.1 19.1 42.2 30.9 12.9 25.6 53.9 23.7 21.7 31.4 58.5 27.6 27.1 27.9 52.1 30.6 25.4 22.1 49.9 23.2 21.3 25.5 53.5 24.8 19.3 31.1 56.6 30.0 25.4 30.4 56.7 28.3 27.2 18.7 46.5 23.0 11.7 19.7 44.2 28.6 17.8 14.6 42.7 21.3 12.8 29.5 54.4 30.1 23.9 27.7 55.3 25.7 22.6 30.2 58.6 24.6 25.4 22.7 48.2 27.1 14.8 25.2 51.0 27.5 21.1 end
Fig. 9.7, p. 378.
gen id = _n graph twoway scatter x2 x1, mlabel(id) msymbol(i) ylabel(42(2)60)xlab(14(2)32)
Regressing y on x1 x2 and calculating the DFFITS, Cook's distance, and DFBETAs, table 9.4, p. 380.
quietly regress y x1 x2
predict dfits, dfits
predict cooksd, cooksd
predict dfbeta1, dfbeta(x1)
predict dfbeta2, dfbeta(x2)
list dfits cooksd dfbeta1 dfbeta2
dfits cooksd dfbeta1 dfbeta2
1. -.3661471 .0459505 -.1314856 .2320319
2. .3838104 .0454812 .1150253 -.1426131
3. -1.273067 .4901566 -1.182525 1.066903
4. -.4763481 .0721618 -.2935194 .1960718
5. -.000073 1.89e-09 -.0000306 .0000503
6. -.0566866 .0011365 .0400812 -.0442677
7. .1279371 .0057649 -.015613 .0543164
8. .5745215 .0979386 .3911265 -.3324536
9. .4021648 .0531335 -.2946556 .2469092
10. -.3638727 .0439571 .2446011 -.2688087
11. .0505459 .0009038 .0170564 -.0024845
12. .3233367 .0351544 .0224579 .0699963
13. -.8507811 .2121502 .5924201 -.3894911
14. .635514 .1248925 .1131721 -.2977041
15. .1888523 .0125753 -.1247569 .0687694
16. .0837681 .0024749 .0431133 -.0251249
17. -.1183734 .0049261 .0550435 -.07609
18. -.1655264 .0096365 .0753286 -.1161002
19. -.3150707 .0323601 -.004072 .0644293
20. .0939971 .0030968 .0022908 -.0033142
Fig. 9.8a, p. 382.
predict predict, xb predict resid, resid graph twoway scatter resid predict [w=cooksd] /// , ylabel(-4.5(1.5)4.5) xlabel(10(5)30) msymbol(o)
Fig. 9.8b, p. 382.
graph twoway scatter cooksd id, connect(l) ylabel(0(.1).5) xlabel(0(5)25)
Calculating the regression equations for regressing y on x1 x2 with and without observation 3, p. 384.
regress y x1 x2
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 2, 17) = 29.80
Model | 385.438738 2 192.719369 Prob > F = 0.0000
Residual | 109.950775 17 6.46769267 R-squared = 0.7781
-------------+------------------------------ Adj R-squared = 0.7519
Total | 495.389513 19 26.0731323 Root MSE = 2.5432
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .2223526 .3034389 0.73 0.474 -.4178475 .8625527
x2 | .6594218 .2911873 2.26 0.037 .0450704 1.273773
_cons | -19.17425 8.36064 -2.29 0.035 -36.81366 -1.534839
------------------------------------------------------------------------------
regress y x1 x2 if id ~= 3
Source | SS df MS Number of obs = 19
-------------+------------------------------ F( 2, 16) = 34.01
Model | 399.146133 2 199.573067 Prob > F = 0.0000
Residual | 93.8907246 16 5.86817029 R-squared = 0.8096
-------------+------------------------------ Adj R-squared = 0.7858
Total | 493.036858 18 27.3909366 Root MSE = 2.4224
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .5641417 .3552815 1.59 0.132 -.1890215 1.317305
x2 | .363502 .3300409 1.10 0.287 -.3361535 1.063158
_cons | -12.42817 8.947045 -1.39 0.184 -31.39506 6.53872
------------------------------------------------------------------------------
Calculating the VIF for the regression of y on x1 x2 x3, p. 388.
quietly regress y x1 x2 x3
* Stata 8 code.
vif
* Stata 9 code and output.
estat vif
Variable | VIF 1/VIF
-------------+----------------------
x1 | 708.84 0.001411
x2 | 564.34 0.001772
x3 | 104.61 0.009560
-------------+----------------------
Mean VIF | 459.26
Returning to the Surgical Unit Example, inputting the data, p. 388.
clear input x1 x2 x3 x4 y logy 6.7 62 81 2.59 200 2.3010 5.1 59 66 1.70 101 2.0043 7.4 57 83 2.16 204 2.3096 6.5 73 41 2.01 101 2.0043 7.8 65 115 4.30 509 2.7067 5.8 38 72 1.42 80 1.9031 5.7 46 63 1.91 80 1.9031 3.7 68 81 2.57 127 2.1038 6.0 67 93 2.50 202 2.3054 3.7 76 94 2.40 203 2.3075 6.3 84 83 4.13 329 2.5172 6.7 51 43 1.86 65 1.8129 5.8 96 114 3.95 830 2.9191 5.8 83 88 3.95 330 2.5185 7.7 62 67 3.40 168 2.2253 7.4 74 68 2.40 217 2.3365 6.0 85 28 2.98 87 1.9395 3.7 51 41 1.55 34 1.5315 7.3 68 74 3.56 215 2.3324 5.6 57 87 3.02 172 2.2355 5.2 52 76 2.85 109 2.0374 3.4 83 53 1.12 136 2.1335 6.7 26 68 2.10 70 1.8451 5.8 67 86 3.40 220 2.3424 6.3 59 100 2.95 276 2.4409 5.8 61 73 3.50 144 2.1584 5.2 52 86 2.45 181 2.2577 11.2 76 90 5.59 574 2.7589 5.2 54 56 2.71 72 1.8573 5.8 76 59 2.58 178 2.2504 3.2 64 65 0.74 71 1.8513 8.7 45 23 2.52 58 1.7634 5.0 59 73 3.50 116 2.0645 5.8 72 93 3.30 295 2.4698 5.4 58 70 2.64 115 2.0607 5.3 51 99 2.60 184 2.2648 2.6 74 86 2.05 118 2.0719 4.3 8 119 2.85 120 2.0792 4.8 61 76 2.45 151 2.1790 5.4 52 88 1.81 148 2.1703 5.2 49 72 1.84 95 1.9777 3.6 28 99 1.30 75 1.8751 8.8 86 88 6.40 483 2.6840 6.5 56 77 2.85 153 2.1847 3.4 77 93 1.48 191 2.2810 6.5 40 84 3.00 123 2.0899 4.5 73 106 3.05 311 2.4928 4.8 86 101 4.10 398 2.5999 5.1 67 77 2.86 158 2.1987 3.9 82 103 4.55 310 2.4914 6.6 77 46 1.95 124 2.0934 6.4 85 40 1.21 125 2.0969 6.4 59 85 2.33 198 2.2967 8.8 78 72 3.20 313 2.4955 end
Calculating the VIF for the regression model y regressed on x1 x2 x3, p. 389.
quietly regress logy x1 x2 x3
* Stata 8 code.
vif
* Stata 9 code and output.
estat vif
Variable | VIF 1/VIF
-------------+----------------------
x1 | 1.03 0.970108
x3 | 1.02 0.977506
x2 | 1.01 0.991774
-------------+----------------------
Mean VIF | 1.02
Fig. 9.9a, p. 390.
predict resid, residual rvfplot, yline(0) ylabel(-.15(.05).15) xlabel(1.5(.25)3)
Fig. 9.9b, p. 390.
graph twoway scatter resid x4, yline(0) ylabel(-.15(.05).15) xlabel(0(1)7)
Fig. 9.9c, p. 390.
avplot x1, ylabel(-.3(.1).4) xlabel(-4(2)6)
Fig. 9.9d, p. 390.
qnorm resid, ylabel(-.1(.05).15) xlabel(-.15(.05).15)
Computing various diagnostic for outlying cases.
quietly regress logy x1 x2 x3 predict hat, hat predict student, rstudent predict dfits, dfits predict cooksd, cooksd gen id = _n
Computing the Bonferroni test procedure for observation 22.
Note: We first demonstrate how to get the correct number for n-p-1 and for 1- alpha/(2*n) before we calculate the critical value using the invttail function. Then we calculate the actual p-value which is .0887476 and therefore not significant at the .05 level. However, the studentized residual is close enough to the critical value to warrant closer scrutiny.
display e(df_r) -1
49
display 1-.05/(2*e(N))
.99953704
display invttail(e(df_r) -1 , .05/(2*e(N)) )
3.5260926
gen p_val = ttail( 1-.05/(2*e(N)) , abs(student))
list student p_val if p_val < .1
student p_val
22. 3.495166 .0887476
Calculating the cut off for the outlying X observations.
display ( 2*(e(df_m)+1) )/e(N) .14814815
Table 9.6, p. 391.
list resid hat student dfits cooksd if ( p_val < .1 | hat > ( 2*(e(df_m)+1) )/e(N) | resid > .11)
resid hat student dfits cooksd
13. .0560032 .1494899 1.304571 .5469329 .0737486
17. -.016169 .1499111 -.3708858 -.1557489 .0061709
22. .1383144 .1273679 3.495166 1.33531 .3640895
27. .1117597 .0310852 2.552272 .4571527 .0470576
28. -.0635543 .2618725 -1.602709 -.9546278 .2208982
32. .040223 .2112798 .9655761 .4997514 .0625225
38. .0902419 .2901519 2.39032 1.52822 .5335639
The final regression model (9.46), p. 392.
regress logy x1 x2 x3
Source | SS df MS Number of obs = 54
-------------+------------------------------ F( 3, 50) = 586.04
Model | 3.86291372 3 1.28763791 Prob > F = 0.0000
Residual | .109858708 50 .002197174 R-squared = 0.9723
-------------+------------------------------ Adj R-squared = 0.9707
Total | 3.97277243 53 .07495797 Root MSE = .04687
------------------------------------------------------------------------------
logy | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .0692251 .0040779 16.98 0.000 .0610343 .0774159
x2 | .0092945 .0003825 24.30 0.000 .0085263 .0100628
x3 | .0095236 .0003064 31.08 0.000 .0089082 .0101391
_cons | .4836209 .0426287 11.34 0.000 .3979985 .5692432
------------------------------------------------------------------------------
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services