|
|
|
||||
|
|
|||||
Inputting the Toluca Company data.
clear input x y 80 399 30 121 50 221 90 376 70 361 60 224 120 546 80 352 100 353 50 157 40 160 70 252 90 389 20 113 110 435 100 420 30 212 50 268 90 377 110 421 30 273 90 468 40 244 80 342 70 323 end
Generate id variable for the sequence plot. Label x variable "Lot size", y variable "Work hrs." and id variable "Run".
generate id = _n label variable x "Lot size" label variable y "Work hrs." label variable id "Run"
Fig. 3.1a, p. 96.
dotplot x, nx(50)

Fig. 3.1 b, p. 96.
twoway connected x id

Fig. 3.1 c, p. 96.
stem x
2* | 0 3* | 000 4* | 00 5* | 000 6* | 0 7* | 000 8* | 000 9* | 0000 10* | 00 11* | 00 12* | 0
Fig. 3.1 d, p. 96.
graph box x

Fig. 3.2a, p. 99.
regress y x predict r, resid twoway scatter r x

Fig. 3.2b, p. 99.
twoway connected r id, sort(id)

Fig. 3.2c, p. 99.
graph box r
Fig. 3.2d, p. 99.
qnorm r

Inputting and labeling the Transit data, p. 100.
clear input y x .60 80 6.70 220 5.30 140 4.00 120 6.55 180 2.15 100 6.60 200 5.75 160 end label variable y "Ridership" label variable x "Maps"
Fig. 3.3a, p. 100.
twoway (scatter y x) (lfit y x)
Fig. 3.3b, p. 100.
regress y x predict r, resid twoway scatter r x, yline(0)
Table 3.1, p. 100.
regress y x predict yhat predict r resid list y x yhat r
+-----------------------------------+
| y x yhat r |
|-----------------------------------|
1. | .6 80 1.6625 -1.0625 |
2. | 6.7 220 7.75 -1.05 |
3. | 5.3 140 4.271429 1.028572 |
4. | 4 120 3.401786 .5982142 |
5. | 6.55 180 6.010714 .5392859 |
|-----------------------------------|
6. | 2.15 100 2.532143 -.3821428 |
7. | 6.6 200 6.880357 -.2803572 |
8. | 5.75 160 5.141071 .6089286 |
+-----------------------------------+
Table 3.2, p. 106.
Note: This table uses the Toluca company data.
egen rank = rank(r) gen prob = (rank-.375)/(25+.25) gen expected = sqrt(2384)*invnormal(prob) list id r rank expected
+------------------------------------+
| run r rank expected |
|------------------------------------|
1. | 1 51.01798 22 51.97266 |
2. | 2 -48.47192 5 -44.10749 |
3. | 3 -19.87596 10 -14.76319 |
4. | 4 -7.684041 11 -9.758779 |
5. | 5 48.72 21 44.1075 |
|------------------------------------|
6. | 6 -52.57798 4 -51.97266 |
7. | 7 55.2099 23 61.48703 |
8. | 8 4.01798 15 9.758775 |
9. | 9 -66.38606 2 -74.17666 |
10. | 10 -83.87596 1 -95.90529 |
|------------------------------------|
11. | 11 -45.17394 6 -37.24776 |
12. | 12 -60.28 3 -61.48703 |
13. | 13 5.315959 16 14.76319 |
14. | 14 -20.7699 8 -25.32683 |
15. | 15 -20.08808 9 -19.92811 |
|------------------------------------|
16. | 16 .6139394 14 4.855084 |
17. | 17 42.52808 20 37.24776 |
18. | 18 27.12404 18 25.32683 |
19. | 19 -6.684041 12 -4.855084 |
20. | 20 -34.08808 7 -31.05527 |
|------------------------------------|
21. | 21 103.5281 25 95.90527 |
22. | 22 84.31596 24 74.17666 |
23. | 23 38.82606 19 31.05527 |
24. | 24 -5.98202 13 0 |
25. | 25 10.72 17 19.92811 |
+------------------------------------+
The Modified Levene test of constancy of error variance, p. 112-114 and Table 3.3, p. 114. The values of mr in the output correspond to the mean of the residuals for each group and the values of md correspond to mean of the deviations for each group.
Note: Stata does have a Levene test (robvar) but it is not the same as the modified Levene test described in this section.
regress y x predict r, resid predict yhat gen group = 1 replace group = 2 if x>70 sort group by group: egen mr = median(r) gen d = abs(r-mr) by group:egen md = mean(d) gen ddif =(d-md)^2 sort group by group: list id x r d ddif ttest d, by(group)
-> group = 1
+--------------------------------------------+
| run x r d ddif |
|--------------------------------------------|
1. | 6 60 -52.57798 32.70202 146.7261 |
2. | 10 50 -83.87596 64 368.0613 |
3. | 14 20 -20.7699 .89394 1929.066 |
4. | 11 40 -45.17394 25.29798 380.917 |
5. | 17 30 42.52808 62.40404 309.3716 |
|--------------------------------------------|
6. | 2 30 -48.47192 28.59596 263.0597 |
7. | 18 50 27.12404 47 4.773898 |
8. | 3 50 -19.87596 0 2008.391 |
9. | 23 40 38.82606 58.70202 192.8472 |
10. | 21 30 103.5281 123.404 6176.226 |
|--------------------------------------------|
11. | 25 70 10.72 30.59596 202.1833 |
12. | 12 70 -60.28 40.40404 19.45725 |
13. | 5 70 48.72 68.59596 565.5306 |
+--------------------------------------------+
----------------------------------------------------
-> group = 2
+---------------------------------------------+
| run x r d ddif |
|---------------------------------------------|
1. | 8 80 4.01798 6.70202 472.9893 |
2. | 13 90 5.315959 8 418.2162 |
3. | 1 80 51.01798 53.70202 637.6475 |
4. | 24 80 -5.98202 3.29798 632.6411 |
5. | 15 110 -20.08808 17.40404 122.0206 |
|---------------------------------------------|
6. | 20 110 -34.08808 31.40404 8.724372 |
7. | 22 90 84.31596 87 3428.063 |
8. | 9 100 -66.38606 63.70202 1242.681 |
9. | 4 90 -7.684041 5 549.9183 |
10. | 7 120 55.2099 57.89394 866.9258 |
|---------------------------------------------|
11. | 16 100 .6139394 3.29798 632.6411 |
12. | 19 90 -6.684041 4 597.819 |
+---------------------------------------------+
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
1 | 13 44.81507 8.975255 32.36074 25.25967 64.37047
2 | 12 28.45034 8.532597 29.55778 9.670218 47.23046
---------+--------------------------------------------------------------------
combined | 25 36.96 6.304496 31.52248 23.94816 49.97184
---------+--------------------------------------------------------------------
diff | 16.36474 12.43066 -9.350043 42.07952
------------------------------------------------------------------------------
diff = mean(1) - mean(2) t = 1.3165
Ho: diff = 0 degrees of freedom = 23
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.8995 Pr(|T| > |t|) = 0.2010 Pr(T > t) = 0.1005
The Breusch-Pagan test, p. 115.
Note1: Stata has a user-written command, bpagan, that computes the Breusch-Pagan test statistic and p-value. You can get this program, from within Stata while you are on-line by typing, for example, findit bpagan (see How can I used the findit command to search for programs and get additional help? for more information about using findit).
Note2: The book reports the p-value for this test as .64. We believe that the correct p-value for this test should be .36 = (1-.64) as reported.
regress y x bpagan x
Breusch-Pagan LM statistic: .8209193 Chi-sq( 1) P-value = .3649
Inputting the Bank data, p. 117.
clear input x y 125 160 100 112 200 124 75 28 150 152 175 156 75 42 175 124 125 150 200 104 100 136 end label variable x "deposit" label variable y "new accounts" gen branch = _n
Table 3.4a, p. 117
list branch x y +--------------------+ | branch x y | |--------------------| 1. | 1 125 160 | 2. | 2 100 112 | 3. | 3 200 124 | 4. | 4 75 28 | 5. | 5 150 152 | |--------------------| 6. | 6 175 156 | 7. | 7 75 42 | 8. | 8 175 124 | 9. | 9 125 150 | 10. | 10 200 104 | |--------------------| 11. | 11 100 136 | +--------------------+Table 3.4b, The Anova table, p. 117.regress y xSource | SS df MS Number of obs = 11 -------------+------------------------------ F( 1, 9) = 3.14 Model | 5141.33841 1 5141.33841 Prob > F = 0.1102 Residual | 14741.5707 9 1637.9523 R-squared = 0.2586 -------------+------------------------------ Adj R-squared = 0.1762 Total | 19882.9091 10 1988.29091 Root MSE = 40.472 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .4867016 .2747105 1.77 0.110 -.1347368 1.10814 _cons | 50.72251 39.39791 1.29 0.230 -38.40176 139.8468 ------------------------------------------------------------------------------Table 3.5, p. 117.drop branch sort x by x: egen mean = mean(y) by x: gen replicate = _n reshape wide y, i(x) j(replicate) list+------------------------+ | x y1 y2 mean | |------------------------| 1. | 75 28 42 35 | 2. | 100 112 136 124 | 3. | 125 150 160 155 | 4. | 150 152 . 152 | 5. | 175 156 124 140 | |------------------------| 6. | 200 104 124 114 | +------------------------+Table 3.6b, p. 123. The f variable in the output is the test statistic used in the test for lack of fit.
Note: Stata has a user-written command, maxr2, that computes the lack-of-fit test. You can get this program, from within Stata while you are on-line by typing, for example, findit maxr2 (see How can I used the findit command to search for programs and get additional help? for more information about using findit).
Also note that if you used the reshape command to generate the last table, you will need to use the first line of syntax below. If you did not use the reshape command to generate the previous table, you will not.
reshape long regress y x maxr2maximum R-square = 0.9423 relative R-square = 0.2744 relative adjusted R-square = 0.1938 SSLF (df) = 13593.571 (4) MSLF = 3398.3927 SSPE (df) = 1148 (5) MSPE = 229.6 F (dfn, dfd) for lack-of-fit test (MSLF/MSPE) = 14.8014 (4,5) prob > F = 0.0056 number of covariate patterns = 6 as ratio of observations = 0.545Inputting Sales Training data, table 3.7, p. 127clear input x y 0.5 42.5 0.5 50.6 1.0 68.5 1.0 80.7 1.5 89.0 1.5 99.6 2.0 105.3 2.0 111.8 2.5 112.3 2.5 125.7 end label variable x "Training" label variable y "Performance" gen trainee = _n gen sqrtx = sqrt(x) list trainee x y sqrtx+----------------------------------+ | trainee x y sqrtx | |----------------------------------| 1. | 1 .5 42.5 .7071068 | 2. | 2 .5 50.6 .7071068 | 3. | 3 1 68.5 1 | 4. | 4 1 80.7 1 | 5. | 5 1.5 89 1.224745 | |----------------------------------| 6. | 6 1.5 99.6 1.224745 | 7. | 7 2 105.3 1.414214 | 8. | 8 2 111.8 1.414214 | 9. | 9 2.5 112.3 1.581139 | 10. | 10 2.5 125.7 1.581139 | +----------------------------------+Fig. 3.14 and the fitted regression function at the bottom of p. 128.twoway scatter y x, name(ch3_14a) twoway scatter y sqrtx, name(ch3_14b) regress y sqrtx predict r, resid twoway scatter r x, name(ch3_14c) qnorm r, name(ch3_14d)(a).
(b).
(c).
(d).
Inputting Plasma Levels data, table 3.8, p. 130.clear input x y logy 0 13.44 1.1284 0 12.84 1.1086 0 11.91 1.0759 0 20.09 1.3030 0 15.60 1.1931 1.0 10.11 1.0048 1.0 11.38 1.0561 1.0 10.28 1.0120 1.0 8.96 .9523 1.0 8.59 .9340 2.0 9.83 .9926 2.0 9.00 .9542 2.0 8.65 .9370 2.0 7.85 .8949 2.0 8.88 .9484 3.0 7.94 .8998 3.0 6.01 .7789 3.0 5.14 .7110 3.0 6.90 .8388 3.0 6.77 .8306 4.0 4.86 .6866 4.0 5.10 .7076 4.0 5.67 .7536 4.0 5.75 .7597 4.0 6.23 .7945 end label variable x "Age" label variable y "Plasma label variable logy "Log(plasma)" list x y logy+----------------------------+ | child x y logy | |----------------------------| 1. | 1 0 13.44 1.1284 | 2. | 2 0 12.84 1.1086 | 3. | 3 0 11.91 1.0759 | 4. | 4 0 20.09 1.303 | 5. | 5 0 15.6 1.1931 | |----------------------------| 6. | 6 1 10.11 1.0048 | 7. | 7 1 11.38 1.0561 | 8. | 8 1 10.28 1.012 | 9. | 9 1 8.96 .9523 | 10. | 10 1 8.59 .934 | |----------------------------| 11. | 11 2 9.83 .9926 | 12. | 12 2 9 .9542 | 13. | 13 2 8.65 .937 | 14. | 14 2 7.85 .8949 | 15. | 15 2 8.88 .9484 | |----------------------------| 16. | 16 3 7.94 .8998 | 17. | 17 3 6.01 .7789 | 18. | 18 3 5.14 .711 | 19. | 19 3 6.9 .8388 | 20. | 20 3 6.77 .8306 | |----------------------------| 21. | 21 4 4.86 .6866 | 22. | 22 4 5.1 .7076 | 23. | 23 4 5.67 .7536 | 24. | 24 4 5.75 .7597 | 25. | 25 4 6.23 .7945 | +----------------------------+Fitted regression function at the bottom of p. 129 and Fig. 3.16, p. 131.twoway scatter y x, name(ch3_16a) twoway scatter logy x, name(ch3_16b) regress logy x predict r, resid twoway scatter r x, yline(0) name(ch3_16c) qnorm r, name(ch3_16d)(a).
(b).
(c).
(d).
Table 3.9, p. 134
means y scalar k2 = r(mean_g) capture drop myw gen myw = . foreach n of numlist 0/20 { local lambda = (`n'-10)/10 scalar k1 = k2^(1-`lambda')/`lambda' if (`lambda' ==0) { quietly replace myw = k2*ln(y) } else { quietly replace myw = k1*(y^`lambda' -1) } quietly reg myw x display in yellow "`lambda'" _col(10) %4.2f `e(rss)' }-1 33.91 -.9 32.70 -.8 31.76 -.7 31.09 -.6 30.69 -.5 30.56 -.4 30.72 -.3 31.18 -.2 31.95 -.1 33.06 0 34.52 .1 36.37 .2 38.64 .3 41.36 .4 44.59 .5 48.37 .6 52.76 .7 57.84 .8 63.67 .9 70.35 1 77.98Fig. 3.18, p. 138.
Note: This table uses the Toluca company data.twoway (lowess y x , bw(.7)) (scatter y x) , xscale(r(0 150)) legend(off) name (ch3_18a) twoway (lfitci y x, nofit level(90) ciplot(rline) ) (lowess y x, bw(.7)), xscale(r(0 150)) legend(off) name(ch3_18b)(a).
(b).
Inputting the Plutonium Measurement data, table 3.10, p. 139.clear input y x 0.150 20 0.004 0 0.069 10 0.030 5 0.011 0 0.004 0 0.041 5 0.109 20 0.068 10 0.009 0 0.009 0 0.048 10 0.006 0 0.083 20 0.037 5 0.039 5 0.132 20 0.004 0 0.006 0 0.059 10 0.051 10 0.002 0 0.049 5 0.106 0 end label variable x "Plutonium Activity, pCi/g" label variable y "Alpha Count, #/sec." gen case = _n list case x y+------------------+ | case x y | |------------------| 1. | 1 20 .15 | 2. | 2 0 .004 | 3. | 3 10 .069 | 4. | 4 5 .03 | 5. | 5 0 .011 | |------------------| 6. | 6 0 .004 | 7. | 7 5 .041 | 8. | 8 20 .109 | 9. | 9 10 .068 | 10. | 10 0 .009 | |------------------| 11. | 11 0 .009 | 12. | 12 10 .048 | 13. | 13 0 .006 | 14. | 14 20 .083 | 15. | 15 5 .037 | |------------------| 16. | 16 5 .039 | 17. | 17 20 .132 | 18. | 18 0 .004 | 19. | 19 0 .006 | 20. | 20 10 .059 | |------------------| 21. | 21 10 .051 | 22. | 22 0 .002 | 23. | 23 5 .049 | 24. | 24 0 .106 | +------------------+Fig. 3.19a, p. 139.twoway scatter y x, name(ch3_19a) twoway (lowess y x, bw(.6)) (scatter y x), legend(off) name(ch3_19b)(a).
Figure. 3.20, p. 140.
drop if case == 24 regress y x predict r, resid predict yhat twoway scatter r yhat, yline(0) qnorm r(a) Source | SS df MS Number of obs = 23 -------------+------------------------------ F( 1, 21) = 229.00 Model | .036190422 1 .036190422 Prob > F = 0.0000 Residual | .003318796 21 .000158038 R-squared = 0.9160 -------------+------------------------------ Adj R-squared = 0.9120 Total | .039509218 22 .001795874 Root MSE = .01257 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .005537 .0003659 15.13 0.000 .0047761 .0062979 _cons | .0070331 .0035988 1.95 0.064 -.000451 .0145173 ------------------------------------------------------------------------------(b).
(c).
Transforming the y variable.gen sqrty = sqrt(y)Fig. 3.21, p. 141. Repeating the whole analysis from fig. 3.20 with the transformed response variable.
regress sqrty x predict r2, resid predict yhat2 twoway scatter r2 yhat2, yline(0) qnorm r2(a) Source | SS df MS Number of obs = 23 -------------+------------------------------ F( 1, 21) = 188.80 Model | .210846556 1 .210846556 Prob > F = 0.0000 Residual | .023452708 21 .001116796 R-squared = 0.8999 -------------+------------------------------ Adj R-squared = 0.8951 Total | .234299264 22 .010649967 Root MSE = .03342 ------------------------------------------------------------------------------ sqrty | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .0133648 .0009727 13.74 0.000 .011342 .0153876 _cons | .0947596 .0095668 9.91 0.000 .0748643 .1146549 ------------------------------------------------------------------------------(b).
(c).
Transforming X.gen sqrtx = sqrt(x)Fig. 3.22a, b and c, p. 142. Repeating the whole analysis from fig. 3.20 with the transformed response variable and the transformed predictor.regress sqrty sqrtx predict r3, resid predict yhat3 twoway scatter r3 yhat3, yline(0) qnorm r3 graph twoway (lfitci sqrty sqrtx, nofit level(90) ciplot(rline) ) (lowess sqrty sqrtx, bw(.7)) (scatter sqrty sqrtx), legend(off) name(ch3_22c)(a) Source | SS df MS Number of obs = 23 -------------+------------------------------ F( 1, 21) = 360.92 Model | .221416125 1 .221416125 Prob > F = 0.0000 Residual | .012883139 21 .000613483 R-squared = 0.9450 -------------+------------------------------ Adj R-squared = 0.9424 Total | .234299264 22 .010649967 Root MSE = .02477 ------------------------------------------------------------------------------ sqrty | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- sqrtx | .0573055 .0030164 19.00 0.000 .0510325 .0635785 _cons | .0730056 .0078306 9.32 0.000 .056721 .0892902 ------------------------------------------------------------------------------(a).
(b).
(c).
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California