Stata Textbook Examples
Applied Linear Statistical Models by Neter, Kutner, et. al.
Chapter 17: Analysis of Factor Level Effects
Inputting the Kenton Food company data, table 16.1, p. 677.
input sales design store
11 1 1
17 1 2
16 1 3
14 1 4
15 1 5
12 2 1
10 2 2
15 2 3
19 2 4
11 2 5
23 3 1
20 3 2
18 3 3
17 3 4
27 4 1
33 4 2
22 4 3
26 4 4
28 4 5
end
Fitting a one-way ANOVA model to the Kenton Food data, table 17.1, p. 711.
anova sales design
Number of obs = 19 R-squared = 0.7881
Root MSE = 3.24756 Adj R-squared = 0.7457
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 588.221053 3 196.073684 18.59 0.0000
|
design | 588.221053 3 196.073684 18.59 0.0000
|
Residual | 158.2 15 10.5466667
-----------+----------------------------------------------------
Total | 746.421053 18 41.4678363
Inputting the Rust Inhibitor data, table 17.2a, p. 712.
clear
input performance brand experiment
43.9 1 1
39.0 1 2
46.7 1 3
43.8 1 4
44.2 1 5
47.7 1 6
43.6 1 7
38.9 1 8
43.6 1 9
40.0 1 10
89.8 2 1
87.1 2 2
92.7 2 3
90.6 2 4
87.7 2 5
92.4 2 6
86.1 2 7
88.1 2 8
90.8 2 9
89.1 2 10
68.4 3 1
69.3 3 2
68.5 3 3
66.4 3 4
70.0 3 5
68.1 3 6
70.6 3 7
65.2 3 8
63.8 3 9
69.2 3 10
36.2 4 1
45.2 4 2
40.7 4 3
40.5 4 4
39.3 4 5
40.3 4 6
43.2 4 7
38.7 4 8
40.9 4 9
39.7 4 10
end
ANOVA of the Rust data and calculating the factor means and the grand mean,
table 17.2b, p. 712.
anova performance brand
Number of obs = 40 R-squared = 0.9863
Root MSE = 2.47787 Adj R-squared = 0.9852
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 15953.4654 3 5317.82178 866.12 0.0000
|
brand | 15953.4654 3 5317.82178 866.12 0.0000
|
Residual | 221.034045 36 6.13983459
-----------+----------------------------------------------------
Total | 16174.4994 39 414.730754
Plotting the normal probability plot using the data set temp created in
the proc glm, fig. 17.3b, p. 715.
Note: this graph does not have the same scale as the graph in the book
clear
input treatment ybar
1 43.1
2 89.4
3 68.0
4 40.5
end
qnorm ybar

Obtaining the 95% confidence interval for the estimated mean sales by
level of design using the food data, p. 718.
Note: the sums of squares in the ANOVA table are calculated with out a
constant term, hence they do not match previous output.
clear
input sales design store
11 1 1
17 1 2
16 1 3
14 1 4
15 1 5
12 2 1
10 2 2
15 2 3
19 2 4
11 2 5
23 3 1
20 3 2
18 3 3
17 3 4
27 4 1
33 4 2
22 4 3
26 4 4
28 4 5
end
anova sales design, noconstant
regress
Source | SS df MS Number of obs = 19
-------------+------------------------------ F( 4, 15) = 170.29
Model | 7183.8 4 1795.95 Prob > F = 0.0000
Residual | 158.2 15 10.5466667 R-squared = 0.9785
-------------+------------------------------ Adj R-squared = 0.9727
Total | 7342 19 386.421053 Root MSE = 3.2476
------------------------------------------------------------------------------
sales Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
design
1 14.6 1.452354 10.05 0.000 11.50438 17.69562
2 13.4 1.452354 9.23 0.000 10.30438 16.49562
3 19.5 1.623782 12.01 0.000 16.03899 22.96101
4 27.2 1.452354 18.73 0.000 24.10438 30.29562
------------------------------------------------------------------------------
Testing the difference in mean sales for design levels 3 and 4 using food
data, p 719.
anova sales design
lincom _coef[design[3]]-_coef[design[4]]
------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -7.7 2.178532 -3.53 0.003 -12.34343 -3.05657
------------------------------------------------------------------------------
Testing the difference between three color and five color designs, p
720-723.
lincom ((_coef[design[1]]+ _coef[design[2]])/2)+((-_coef[design[3]]-_coef[design[4]])/2)
( 1) .5 design[1] + .5 design[2] - .5 design[3] - .5 design[4] = 0
------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -9.35 1.497053 -6.25 0.000 -12.54089 -6.159108
------------------------------------------------------------------------------
Inferences for linear combination of factor level means, p. 723.
Note: The coefficients in the estimate statement add to zero, the last
coefficient is -.75 instead of .25.
lincom (.35)*_b[design[1]]+ (.28)*_b[design[2]]+ (.12)*_b[design[3]]+ (.25)*_b[design[4]]
( 1) .35 design[1] + .28 design[2] + .12 design[3] + .25 design[4] = 0
------------------------------------------------------------------------------
sales | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -9.198 1.283835 -7.16 0.000 -11.93443 -6.46157
------------------------------------------------------------------------------
Tukey Multiple comparisons procedure for the Rust data, p. 728-729.
This example uses the prcomp, a user written program. You can
download this program from within Stata using the findit command.
For example, to download the prcomp command you can type findit
prcomp(see How can I used the findit
command to search for programs and get additional help? for more
information about using findit).
Note: The original variable names are incompatible with the prcomp
command because they contain too many characters, we have remedied this
problem by renaming the variables before issuing the command.
rename performance perform
prcomp perform brand, tukey order(M)
Pairwise Comparisons of Means
Response variable (Y): perform
Group variable (X): brand
Group variable (X): brand Response variable (Y): perform
------------------------------- -------------------------------
Level n Mean S.E.
------------------------------------------------------------------
4 10 40.47 .7704328
1 10 43.14 .9487067
3 10 67.95 .6857681
2 10 89.44 .701459
------------------------------------------------------------------
Simultaneous confidence level: 95% (Tukey wsd method)
Homogeneous error SD = 2.477869, degrees of freedom = 36
95%
Level(X) Mean(Y) Level(X) Mean(Y) Diff Mean Confidence Limits
-------------------------------------------------------------------------------
1 43.14 4 40.47 2.67 -.3145751 5.654575
3 67.95 4 40.47 27.48 24.49542 30.46457
1 43.14 24.81 21.82542 27.79457
2 89.44 4 40.47 48.97 45.98542 51.95457
1 43.14 46.3 43.31542 49.28457
3 67.95 21.49 18.50542 24.47457
-------------------------------------------------------------------------------
Tukey Multiple comparisons for the Kenton Food data, p. 730-731.
This example uses the prcomp, a user written program. You can
download this program from within Stata using the findit command.
For example, to download the prcomp command you can type findit
prcomp (see How can I used the findit
command to search for programs and get additional help? for more
information about using findit).
clear
input sales design store
11 1 1
17 1 2
16 1 3
14 1 4
15 1 5
12 2 1
10 2 2
15 2 3
19 2 4
11 2 5
23 3 1
20 3 2
18 3 3
17 3 4
27 4 1
33 4 2
22 4 3
26 4 4
28 4 5
end
anova sales design
prcomp sales design, tukey order(M) level(.9)
Pairwise Comparisons of Means
Response variable (Y): sales
Group variable (X): design
Group variable (X): design Response variable (Y): sales
------------------------------- -------------------------------
Level n Mean S.E.
------------------------------------------------------------------
2 5 13.4 1.630951
1 5 14.6 1.029563
3 4 19.5 1.322876
4 5 27.2 1.772005
------------------------------------------------------------------
Simultaneous confidence level: 90% (Tukey wsd method)
Homogeneous error SD = 3.247563, degrees of freedom = 15
90%
Level(X) Mean(Y) Level(X) Mean(Y) Diff Mean Confidence Limits
-------------------------------------------------------------------------------
1 14.6 2 13.4 1.2 -3.941223 6.341223
3 19.5 2 13.4 6.1 .6469095 11.55309
1 14.6 4.9 -.5530905 10.35309
4 27.2 2 13.4 13.8 8.658777 18.94122
1 14.6 12.6 7.458777 17.74122
3 19.5 7.7 2.24691 13.15309
-------------------------------------------------------------------------------
The Scheffe comparisons procedure for the Kenton Food data, p. 734-735.
Stata produces the multiple comparison test which gives the same results as
the confidence intervals.
oneway sales design, scheffe
Analysis of Variance
Source SS df MS F Prob > F
------------------------------------------------------------------------
Between groups 588.221053 3 196.073684 18.59 0.0000
Within groups 158.2 15 10.5466667
------------------------------------------------------------------------
Total 746.421053 18 41.4678363
Bartlett's test for equal variances: chi2(3) = 1.3144 Prob>chi2 = 0.726
Comparison of sales by design
(Scheffe)
Row Mean-|
Col Mean | 1 2 3
---------+---------------------------------
2 | -1.2
| 0.951
|
3 | 4.9 6.1
| 0.213 0.089
|
4 | 12.6 13.8 7.7
| 0.000 0.000 0.025
Bonferroni comparisons procedure for the Kenton food data, p. 736-737.
Stata produces the multiple comparisons test which gives the same results as
the confidence intervals.
oneway sales design, bonferroni
Analysis of Variance
Source SS df MS F Prob > F
------------------------------------------------------------------------
Between groups 588.221053 3 196.073684 18.59 0.0000
Within groups 158.2 15 10.5466667
------------------------------------------------------------------------
Total 746.421053 18 41.4678363
Bartlett's test for equal variances: chi2(3) = 1.3144 Prob>chi2 = 0.726
Comparison of sales by design
(Bonferroni)
Row Mean-|
Col Mean | 1 2 3
---------+---------------------------------
2 | -1.2
| 1.000
|
3 | 4.9 6.1
| 0.240 0.081
|
4 | 12.6 13.8 7.7
| 0.000 0.000 0.018
Inputting the Piecework Trainee data, table 17.6, p. 743. This
example uses the tukeyhsd, a user written program. You can
download this program from within Stata using the findit command.
For example, to download the tukeyhsd command you can type findit
tukeyhsd (see How can I used the findit
command to search for programs and get additional help? for more
information about using findit).
clear
input units treat employee
40 1 1
39 1 2
39 1 3
36 1 4
42 1 5
43 1 6
41 1 7
53 2 1
48 2 2
49 2 3
50 2 4
51 2 5
50 2 6
48 2 7
53 3 1
58 3 2
56 3 3
59 3 4
53 3 5
59 3 6
58 3 7
63 4 1
62 4 2
59 4 3
61 4 4
62 4 5
62 4 6
61 4 7
end
label define trt 1 "6 hours" 2 "8 hours" 3 "10 hours" 4 "12 hours"
label values treat trt
anova units treat
tukeyhsd treat
Number of obs = 28 R-squared = 0.9465
Root MSE = 2.06444 Adj R-squared = 0.9398
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 1808.67857 3 602.892857 141.46 0.0000
|
treat | 1808.67857 3 602.892857 141.46 0.0000
|
Residual | 102.285714 24 4.26190476
-----------+----------------------------------------------------
Total | 1910.96429 27 70.776455
Tukey HSD pairwise comparisons for variable treat
studentized range critical value(.05, 4, 24) = 3.9013476
uses harmonic mean sample size = 7.000
mean
grp vs grp group means dif HSD-test
-------------------------------------------------------
1 vs 2 40.0000 49.8571 9.8571 12.6328*
1 vs 3 40.0000 56.5714 16.5714 21.2377*
1 vs 4 40.0000 61.4286 21.4286 27.4625*
2 vs 3 49.8571 56.5714 6.7143 8.6049*
2 vs 4 49.8571 61.4286 11.5714 14.8298*
3 vs 4 56.5714 61.4286 4.8571 6.2248*
Figure 17.6, p. 745 using Piecework Trainee data from previous example.
twoway (scatter units treat, xlabel(1 "6" 2 "8" 3 "10" 4 "12") xscale(r(0.9 4.1)) xtitle("Hours of Training")) (qfit units treat, legend(off))

Table 17.7, p745.
gen hours = .
recode hours .=6 if treat==1
recode hours .=8 if treat==2
recode hours .=10 if treat==3
recode hours .=12 if treat==4
egen mean = mean(hours)
gen xi = hours-mean
gen xi2 = xi^2
list treat employee units xi xi2, nolabel
+-------------------------------------+
| treat employee units xi xi2 |
|-------------------------------------|
1. | 1 1 40 -3 9 |
2. | 1 2 39 -3 9 |
3. | 1 3 39 -3 9 |
4. | 1 4 36 -3 9 |
5. | 1 5 42 -3 9 |
|-------------------------------------|
6. | 1 6 43 -3 9 |
7. | 1 7 41 -3 9 |
8. | 2 1 53 -1 1 |
9. | 2 2 48 -1 1 |
10. | 2 3 49 -1 1 |
|-------------------------------------|
11. | 2 4 50 -1 1 |
12. | 2 5 51 -1 1 |
13. | 2 6 50 -1 1 |
14. | 2 7 48 -1 1 |
15. | 3 1 53 1 1 |
|-------------------------------------|
16. | 3 2 58 1 1 |
17. | 3 3 56 1 1 |
18. | 3 4 59 1 1 |
19. | 3 5 53 1 1 |
20. | 3 6 59 1 1 |
|-------------------------------------|
21. | 3 7 58 1 1 |
22. | 4 1 63 3 9 |
23. | 4 2 62 3 9 |
24. | 4 3 59 3 9 |
25. | 4 4 61 3 9 |
|-------------------------------------|
26. | 4 5 62 3 9 |
27. | 4 6 62 3 9 |
28. | 4 7 61 3 9 |
+-------------------------------------+
Table 17.8a, p. 745-746.
regress units xi xi2
Source | SS df MS Number of obs = 28
-------------+------------------------------ F( 2, 25) = 219.72
Model | 1808.1 2 904.05 Prob > F = 0.0000
Residual | 102.864286 25 4.11457143 R-squared = 0.9462
-------------+------------------------------ Adj R-squared = 0.9419
Total | 1910.96429 27 70.776455 Root MSE = 2.0284
------------------------------------------------------------------------------
units | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
xi | 3.55 .1714345 20.71 0.000 3.196924 3.903076
xi2 | -.3125 .0958348 -3.26 0.003 -.5098755 -.1151245
_cons | 53.52679 .6136422 87.23 0.000 52.26297 54.79061
------------------------------------------------------------------------------
Table 17.8b, p. 745-746.
anova units treat
Number of obs = 28 R-squared = 0.9465
Root MSE = 2.06444 Adj R-squared = 0.9398
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 1808.67857 3 602.892857 141.46 0.0000
|
treat | 1808.67857 3 602.892857 141.46 0.0000
|
Residual | 102.285714 24 4.26190476
-----------+----------------------------------------------------
Total | 1910.96429 27 70.776455
Table 17.8c, p. 745-746. To obtain the correct sums of squares for
the Lack of Fit Test run the first ANOVA and save the residuals. Next
run an ANOVA adding the predicted residuals. The sum of squares for
the residuals, r in this case, is the sum of square due to pure
error. Subtract the pure error from the total error to get the sum of
squares due to lack of fit.
anova units treat
predict r, resid
(output omitted)
anova units treat r
Number of obs = 28 R-squared = 1.0000
Root MSE = 7.5e-07 Adj R-squared = 1.0000
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 1910.96429 18 106.164683
|
treat | 243.2 3 81.0666667
r | 102.285714 15 6.81904762
|
Residual | 5.0022e-12 9 5.5580e-13
-----------+----------------------------------------------------
Total | 1910.96429 27 70.776455
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California.
|