UCLA Academic Technology Services HomeServicesClassesContactJobs

Stata Textbook Examples
Applied Linear Statistical Models by Neter, Kutner, et. al.
Chapter 17: Analysis of Factor Level Effects

Inputting the Kenton Food company data, table 16.1, p. 677.
input sales design store
  11  1  1
  17  1  2
  16  1  3
  14  1  4
  15  1  5
  12  2  1
  10  2  2
  15  2  3
  19  2  4
  11  2  5
  23  3  1
  20  3  2
  18  3  3
  17  3  4
  27  4  1
  33  4  2
  22  4  3
  26  4  4
  28  4  5
end
Fitting a one-way ANOVA model to the Kenton Food data, table 17.1, p. 711.
anova sales design
                           Number of obs =      19     R-squared     =  0.7881
                           Root MSE      = 3.24756     Adj R-squared =  0.7457

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  588.221053     3  196.073684      18.59     0.0000
                         |
                  design |  588.221053     3  196.073684      18.59     0.0000
                         |
                Residual |       158.2    15  10.5466667   
              -----------+----------------------------------------------------
                   Total |  746.421053    18  41.4678363

Inputting the Rust Inhibitor data, table 17.2a, p. 712.
clear
input performance brand experiment
  43.9  1   1
  39.0  1   2
  46.7  1   3
  43.8  1   4
  44.2  1   5
  47.7  1   6
  43.6  1   7
  38.9  1   8
  43.6  1   9
  40.0  1  10
  89.8  2   1
  87.1  2   2
  92.7  2   3
  90.6  2   4
  87.7  2   5
  92.4  2   6
  86.1  2   7
  88.1  2   8
  90.8  2   9
  89.1  2  10
  68.4  3   1
  69.3  3   2
  68.5  3   3
  66.4  3   4
  70.0  3   5
  68.1  3   6
  70.6  3   7
  65.2  3   8
  63.8  3   9
  69.2  3  10
  36.2  4   1
  45.2  4   2
  40.7  4   3
  40.5  4   4
  39.3  4   5
  40.3  4   6
  43.2  4   7
  38.7  4   8
  40.9  4   9
  39.7  4  10
end
ANOVA of the Rust data and calculating the factor means and the grand mean, table 17.2b, p. 712.
anova performance brand
                           Number of obs =      40     R-squared     =  0.9863
                           Root MSE      = 2.47787     Adj R-squared =  0.9852

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  15953.4654     3  5317.82178     866.12     0.0000
                         |
                   brand |  15953.4654     3  5317.82178     866.12     0.0000
                         |
                Residual |  221.034045    36  6.13983459   
              -----------+----------------------------------------------------
                   Total |  16174.4994    39  414.730754

Plotting the normal probability plot using the data set temp created in the proc glm, fig. 17.3b, p. 715.
Note: this graph does not have the same scale as the graph in the book

clear
input treatment ybar
1 43.1
2 89.4
3 68.0
4 40.5
end
qnorm ybar


Obtaining the 95% confidence interval for the estimated mean sales by level of design using the food data, p. 718.
Note: the sums of squares in the ANOVA table are calculated with out a constant term, hence they do not match previous output.

clear
input sales design store
  11  1  1
  17  1  2
  16  1  3
  14  1  4
  15  1  5
  12  2  1
  10  2  2
  15  2  3
  19  2  4
  11  2  5
  23  3  1
  20  3  2
  18  3  3
  17  3  4
  27  4  1
  33  4  2
  22  4  3
  26  4  4
  28  4  5
end
anova sales design, noconstant
regress
      Source |       SS       df       MS              Number of obs =      19
-------------+------------------------------           F(  4,    15) =  170.29
       Model |      7183.8     4     1795.95           Prob > F      =  0.0000
    Residual |       158.2    15  10.5466667           R-squared     =  0.9785
-------------+------------------------------           Adj R-squared =  0.9727
       Total |        7342    19  386.421053           Root MSE      =  3.2476

------------------------------------------------------------------------------
       sales        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------------------------------------------
design
           1         14.6   1.452354    10.05   0.000     11.50438    17.69562
           2         13.4   1.452354     9.23   0.000     10.30438    16.49562
           3         19.5   1.623782    12.01   0.000     16.03899    22.96101
           4         27.2   1.452354    18.73   0.000     24.10438    30.29562
------------------------------------------------------------------------------

Testing the difference in mean sales for design levels 3 and 4 using food data, p 719.

anova sales design
lincom _coef[design[3]]-_coef[design[4]]
------------------------------------------------------------------------------
       sales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |       -7.7   2.178532    -3.53   0.003    -12.34343    -3.05657
------------------------------------------------------------------------------

Testing the difference between three color and five color designs, p 720-723.

lincom ((_coef[design[1]]+ _coef[design[2]])/2)+((-_coef[design[3]]-_coef[design[4]])/2)
 ( 1)  .5 design[1] + .5 design[2] - .5 design[3] - .5 design[4] = 0

------------------------------------------------------------------------------
       sales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |      -9.35   1.497053    -6.25   0.000    -12.54089   -6.159108
------------------------------------------------------------------------------

Inferences for linear combination of factor level means, p. 723.
Note: The coefficients in the estimate statement add to zero, the last coefficient is -.75 instead of .25.

lincom (.35)*_b[design[1]]+ (.28)*_b[design[2]]+ (.12)*_b[design[3]]+ (.25)*_b[design[4]]
 ( 1)  .35 design[1] + .28 design[2] + .12 design[3] + .25 design[4] = 0

------------------------------------------------------------------------------
       sales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |     -9.198   1.283835    -7.16   0.000    -11.93443    -6.46157
------------------------------------------------------------------------------

Tukey Multiple comparisons procedure for the Rust data, p. 728-729.  This example uses the prcomp, a user written program.  You can download this program from within Stata using the findit command.  For example, to download the prcomp command you can type findit prcomp(see How can I used the findit command to search for programs and get additional help? for more information about using findit).
Note: The original variable names are incompatible with the prcomp command because they contain too many characters, we have remedied this problem by renaming the variables before issuing the command.

rename performance perform
prcomp perform brand, tukey order(M)

                   Pairwise Comparisons of Means

Response variable (Y): perform     
Group variable (X):    brand       

   Group variable (X): brand       Response variable (Y): perform
-------------------------------    -------------------------------
      Level                            n         Mean         S.E.
------------------------------------------------------------------
          4                           10        40.47     .7704328
          1                           10        43.14     .9487067
          3                           10        67.95     .6857681
          2                           10        89.44      .701459
------------------------------------------------------------------

Simultaneous confidence level: 95%    (Tukey wsd method)
Homogeneous error SD = 2.477869, degrees of freedom = 36

                                                                    95%
Level(X)    Mean(Y)   Level(X)    Mean(Y)      Diff Mean     Confidence Limits
-------------------------------------------------------------------------------
       1      43.14          4      40.47           2.67   -.3145751   5.654575

       3      67.95          4      40.47          27.48    24.49542   30.46457
                             1      43.14          24.81    21.82542   27.79457

       2      89.44          4      40.47          48.97    45.98542   51.95457
                             1      43.14           46.3    43.31542   49.28457
                             3      67.95          21.49    18.50542   24.47457
-------------------------------------------------------------------------------

Tukey Multiple comparisons for the Kenton Food data, p. 730-731.  This example uses the prcomp, a user written program.  You can download this program from within Stata using the findit command.  For example, to download the prcomp command you can type findit prcomp (see How can I used the findit command to search for programs and get additional help? for more information about using findit).

clear
input sales design store
  11  1  1
  17  1  2
  16  1  3
  14  1  4
  15  1  5
  12  2  1
  10  2  2
  15  2  3
  19  2  4
  11  2  5
  23  3  1
  20  3  2
  18  3  3
  17  3  4
  27  4  1
  33  4  2
  22  4  3
  26  4  4
  28  4  5
end
anova sales design
prcomp sales design, tukey order(M) level(.9)
                   Pairwise Comparisons of Means

Response variable (Y): sales       
Group variable (X):    design      

  Group variable (X): design        Response variable (Y): sales
-------------------------------    -------------------------------
      Level                            n         Mean         S.E.
------------------------------------------------------------------
          2                            5         13.4     1.630951
          1                            5         14.6     1.029563
          3                            4         19.5     1.322876
          4                            5         27.2     1.772005
------------------------------------------------------------------

Simultaneous confidence level: 90%    (Tukey wsd method)
Homogeneous error SD = 3.247563, degrees of freedom = 15

                                                                    90%
Level(X)    Mean(Y)   Level(X)    Mean(Y)      Diff Mean     Confidence Limits
-------------------------------------------------------------------------------
       1       14.6          2       13.4            1.2   -3.941223   6.341223

       3       19.5          2       13.4            6.1    .6469095   11.55309
                             1       14.6            4.9   -.5530905   10.35309

       4       27.2          2       13.4           13.8    8.658777   18.94122
                             1       14.6           12.6    7.458777   17.74122
                             3       19.5            7.7     2.24691   13.15309
-------------------------------------------------------------------------------

The Scheffe comparisons procedure for the Kenton Food data, p. 734-735.  Stata produces the multiple comparison test which gives the same results as the confidence intervals.

oneway sales design, scheffe
                        Analysis of Variance
    Source              SS         df      MS            F     Prob > F
------------------------------------------------------------------------
Between groups      588.221053      3   196.073684     18.59     0.0000
 Within groups           158.2     15   10.5466667
------------------------------------------------------------------------
    Total           746.421053     18   41.4678363

Bartlett's test for equal variances:  chi2(3) =   1.3144  Prob>chi2 = 0.726

                        Comparison of sales by design
                                  (Scheffe)
Row Mean-|
Col Mean |          1          2          3
---------+---------------------------------
       2 |       -1.2
         |      0.951
         |
       3 |        4.9        6.1
         |      0.213      0.089
         |
       4 |       12.6       13.8        7.7
         |      0.000      0.000      0.025

Bonferroni comparisons procedure for the Kenton food data, p. 736-737.  Stata produces the multiple comparisons test which gives the same results as the confidence intervals.

oneway sales design, bonferroni
                        Analysis of Variance
    Source              SS         df      MS            F     Prob > F
------------------------------------------------------------------------
Between groups      588.221053      3   196.073684     18.59     0.0000
 Within groups           158.2     15   10.5466667
------------------------------------------------------------------------
    Total           746.421053     18   41.4678363

Bartlett's test for equal variances:  chi2(3) =   1.3144  Prob>chi2 = 0.726

                        Comparison of sales by design
                                (Bonferroni)
Row Mean-|
Col Mean |          1          2          3
---------+---------------------------------
       2 |       -1.2
         |      1.000
         |
       3 |        4.9        6.1
         |      0.240      0.081
         |
       4 |       12.6       13.8        7.7
         |      0.000      0.000      0.018

Inputting the Piecework Trainee data, table 17.6, p. 743.  This example uses the tukeyhsd, a user written program.  You can download this program from within Stata using the findit command.  For example, to download the tukeyhsd command you can type findit tukeyhsd (see How can I used the findit command to search for programs and get additional help? for more information about using findit).

clear
input units treat employee
  40  1  1
  39  1  2
  39  1  3
  36  1  4
  42  1  5
  43  1  6
  41  1  7
  53  2  1
  48  2  2
  49  2  3
  50  2  4
  51  2  5
  50  2  6
  48  2  7
  53  3  1
  58  3  2
  56  3  3
  59  3  4
  53  3  5
  59  3  6
  58  3  7
  63  4  1
  62  4  2
  59  4  3
  61  4  4
  62  4  5
  62  4  6
  61  4  7
end
label define trt 1 "6 hours" 2 "8 hours" 3 "10 hours" 4 "12 hours"
label values treat trt
anova units treat
tukeyhsd treat
                           Number of obs =      28     R-squared     =  0.9465
                           Root MSE      = 2.06444     Adj R-squared =  0.9398

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  1808.67857     3  602.892857     141.46     0.0000
                         |
                   treat |  1808.67857     3  602.892857     141.46     0.0000
                         |
                Residual |  102.285714    24  4.26190476   
              -----------+----------------------------------------------------
                   Total |  1910.96429    27   70.776455
Tukey HSD pairwise comparisons for variable treat
studentized range critical value(.05, 4, 24) = 3.9013476
uses harmonic mean sample size =    7.000

                                       mean 
grp vs grp       group means           dif    HSD-test
-------------------------------------------------------
  1 vs   2    40.0000    49.8571      9.8571  12.6328*
  1 vs   3    40.0000    56.5714     16.5714  21.2377*
  1 vs   4    40.0000    61.4286     21.4286  27.4625*
  2 vs   3    49.8571    56.5714      6.7143   8.6049*
  2 vs   4    49.8571    61.4286     11.5714  14.8298*
  3 vs   4    56.5714    61.4286      4.8571   6.2248*

Figure 17.6, p. 745 using Piecework Trainee data from previous example.

twoway (scatter units treat, xlabel(1 "6" 2 "8" 3 "10" 4 "12") xscale(r(0.9 4.1)) xtitle("Hours of Training")) (qfit units treat, legend(off))

Table 17.7, p745.

gen hours = .
recode hours .=6 if treat==1
recode hours .=8 if treat==2
recode hours .=10 if treat==3
recode hours .=12 if treat==4
egen mean = mean(hours)
gen xi = hours-mean
gen xi2 = xi^2
list treat employee units xi xi2, nolabel
     +-------------------------------------+
     | treat   employee   units   xi   xi2 |
     |-------------------------------------|
  1. |     1          1      40   -3     9 |
  2. |     1          2      39   -3     9 |
  3. |     1          3      39   -3     9 |
  4. |     1          4      36   -3     9 |
  5. |     1          5      42   -3     9 |
     |-------------------------------------|
  6. |     1          6      43   -3     9 |
  7. |     1          7      41   -3     9 |
  8. |     2          1      53   -1     1 |
  9. |     2          2      48   -1     1 |
 10. |     2          3      49   -1     1 |
     |-------------------------------------|
 11. |     2          4      50   -1     1 |
 12. |     2          5      51   -1     1 |
 13. |     2          6      50   -1     1 |
 14. |     2          7      48   -1     1 |
 15. |     3          1      53    1     1 |
     |-------------------------------------|
 16. |     3          2      58    1     1 |
 17. |     3          3      56    1     1 |
 18. |     3          4      59    1     1 |
 19. |     3          5      53    1     1 |
 20. |     3          6      59    1     1 |
     |-------------------------------------|
 21. |     3          7      58    1     1 |
 22. |     4          1      63    3     9 |
 23. |     4          2      62    3     9 |
 24. |     4          3      59    3     9 |
 25. |     4          4      61    3     9 |
     |-------------------------------------|
 26. |     4          5      62    3     9 |
 27. |     4          6      62    3     9 |
 28. |     4          7      61    3     9 |
     +-------------------------------------+
Table 17.8a, p. 745-746.
regress units xi xi2
      Source |       SS       df       MS              Number of obs =      28
-------------+------------------------------           F(  2,    25) =  219.72
       Model |      1808.1     2      904.05           Prob > F      =  0.0000
    Residual |  102.864286    25  4.11457143           R-squared     =  0.9462
-------------+------------------------------           Adj R-squared =  0.9419
       Total |  1910.96429    27   70.776455           Root MSE      =  2.0284

------------------------------------------------------------------------------
       units |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          xi |       3.55   .1714345    20.71   0.000     3.196924    3.903076
         xi2 |     -.3125   .0958348    -3.26   0.003    -.5098755   -.1151245
       _cons |   53.52679   .6136422    87.23   0.000     52.26297    54.79061
------------------------------------------------------------------------------

Table 17.8b, p. 745-746.

anova units treat
                           Number of obs =      28     R-squared     =  0.9465
                           Root MSE      = 2.06444     Adj R-squared =  0.9398

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  1808.67857     3  602.892857     141.46     0.0000
                         |
                   treat |  1808.67857     3  602.892857     141.46     0.0000
                         |
                Residual |  102.285714    24  4.26190476   
              -----------+----------------------------------------------------
                   Total |  1910.96429    27   70.776455 

Table 17.8c, p. 745-746.  To obtain the correct sums of squares for the Lack of Fit Test run the first ANOVA and save the residuals.  Next run an ANOVA adding the predicted residuals.  The sum of squares for the residuals, r in this case, is the sum of square due to pure error.  Subtract the pure error from the total error to get the sum of squares due to lack of fit.

anova units treat
predict r, resid
(output omitted)
anova units treat r
                         Number of obs =      28     R-squared     =  1.0000
                           Root MSE      = 7.5e-07     Adj R-squared =  1.0000

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  1910.96429    18  106.164683   
                         |
                   treat |       243.2     3  81.0666667   
                       r |  102.285714    15  6.81904762   
                         |
                Residual |  5.0022e-12     9  5.5580e-13   
              -----------+----------------------------------------------------
                   Total |  1910.96429    27   70.776455   

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California