UCLA Academic Technology Services HomeServicesClassesContactJobs

Stata Textbook Examples
Applied Linear Statistical Models by Neter, Kutner, et. al.
Chapter 18: ANOVA Diagnostics and Remedial Measures

Inputting the Rust Inhibitor data, table 17.2a, p. 712.
clear
input performance brand experiment
  43.9  1   1
  39.0  1   2
  46.7  1   3
  43.8  1   4
  44.2  1   5
  47.7  1   6
  43.6  1   7
  38.9  1   8
  43.6  1   9
  40.0  1  10
  89.8  2   1
  87.1  2   2
  92.7  2   3
  90.6  2   4
  87.7  2   5
  92.4  2   6
  86.1  2   7
  88.1  2   8
  90.8  2   9
  89.1  2  10
  68.4  3   1
  69.3  3   2
  68.5  3   3
  66.4  3   4
  70.0  3   5
  68.1  3   6
  70.6  3   7
  65.2  3   8
  63.8  3   9
  69.2  3  10
  36.2  4   1
  45.2  4   2
  40.7  4   3
  40.5  4   4
  39.3  4   5
  40.3  4   6
  43.2  4   7
  38.7  4   8
  40.9  4   9
  39.7  4  10
end

Table 18.1, p. 758.

anova performance brand
predict r, residuals
table experiment brand, contents(mean r) cell(5) stubw(10)
---------------------------------------
           |           brand           
experiment |     1      2      3      4
-----------+---------------------------
         1 |   .76    .36    .45  -4.27
         2 | -4.14  -2.34   1.35   4.73
         3 |  3.56   3.26    .55    .23
         4 |   .66   1.16  -1.55    .03
         5 |  1.06  -1.74   2.05  -1.17
         6 |  4.56   2.96    .15   -.17
         7 |   .46  -3.34   2.65   2.73
         8 | -4.24  -1.34  -2.75  -1.77
         9 |   .46   1.36  -4.15    .43
        10 | -3.14   -.34   1.25   -.77
---------------------------------------

Figure 18.1a, p. 759.
anova performance brand
predict yhat
predict r, residuals
twoway scatter r yhat, ms(x) msize(huge)

Figure 18.1b, p. 759.
twoway scatter brand r, ylabel(1 "A" 2 "B" 3 "C" 4 "D")
Figure 18.1c, p. 759.
qnorm r, ms(x) msize(huge)


Inputting ABT Electronics data, table 18.2, p. 765.
clear
input strength type joint
  14.87  1  1
  16.81  1  2
  15.83  1  3
  15.47  1  4
  13.60  1  5
  14.76  1  6
  17.40  1  7
  14.62  1  8
  18.43  2  1
  18.76  2  2
  20.12  2  3
  19.11  2  4
  19.81  2  5
  18.43  2  6
  17.16  2  7
  16.40  2  8
  16.95  3  1
  12.28  3  2
  12.00  3  3
  13.18  3  4
  14.99  3  5
  15.76  3  6
  19.35  3  7
  15.52  3  8
   8.59  4  1
  10.90  4  2
   8.60  4  3
  10.13  4  4
  10.28  4  5
   9.98  4  6
   9.41  4  7
  10.04  4  8
  11.55  5  1
  13.36  5  2
  13.64  5  3
  12.16  5  4
  11.62  5  5
  12.39  5  6
  12.05  5  7
  11.95  5  8
end
Table 18.2, the mean, median and variance of pull strength by flux type, p. 765.
Note: p50 stands for the 50th percentile which is the median.
sort type
tabstat strength, by(type) statistics(mean median variance) nosep
Summary for variables: strength
     by categories of: type 

    type |      mean       p50  variance
---------+------------------------------
       1 |     15.42     15.17  1.530514
       2 |   18.5275    18.595  1.569936
       3 |  15.00375    15.255  6.183399
       4 |   9.74125     10.01  .6668407
       5 |     12.34    12.105      .592
---------+------------------------------
   Total |   14.2065     14.13  10.95925
----------------------------------------
Fig. 18.6, p. 766.
twoway scatter type strength

Modified Levene Test, p. 767.

robvar strength, by(type)
            |         Summary of strength
       type |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          1 |       15.42   1.2371393           8
          2 |     18.5275    1.252971           8
          3 |    15.00375   2.4866442           8
          4 |   9.7412499   .81660316           8
          5 |       12.34   .76941538           8
------------+------------------------------------
      Total |     14.2065   3.3104765          40

W0  =  3.0678112   df(4, 35)     Pr > F = 0.02880559

W50 =  2.9357754   df(4, 35)     Pr > F = 0.0341384

W10 =  3.0678112   df(4, 35)     Pr > F = 0.02880559
Table 18.3, p. 768.
sort type
by type: egen median = median(strength)
gen d = abs(strength-median)
table joint type, contents(mean d) cell(5) stubw(10)
----------------------------------------------
           |               type               
     joint |     1      2      3      4      5
-----------+----------------------------------
         1 |    .3   .165    1.7   1.42   .555
         2 |  1.64   .165   2.98    .89   1.26
         3 |   .66   1.52   3.26   1.41   1.54
         4 |    .3   .515   2.07    .12   .055
         5 |  1.57   1.21   .265    .27   .485
         6 |   .41   .165   .505    .03   .285
         7 |  2.23   1.44    4.1     .6   .055
         8 |   .55    2.2   .265    .03   .155
----------------------------------------------

Creating the weights and the dummy variables for type to be used in the weighted least squares regression. Table 18.4, p. 769-771.
by type: egen s = sd(strength)
gen weight = (1/s^2)
tab type, gen(x)
gen x = 1
list type joint strength x1 x2 x3 x4 x5 weight x, clean
       type   joint   strength   x1   x2   x3   x4   x5     weight   x  
  1.      1       1      14.87    1    0    0    0    0   .6533754   1  
  2.      1       2      16.81    1    0    0    0    0   .6533754   1  
  3.      1       3      15.83    1    0    0    0    0   .6533754   1  
  4.      1       4      15.47    1    0    0    0    0   .6533754   1  
  5.      1       5       13.6    1    0    0    0    0   .6533754   1  
  6.      1       6      14.76    1    0    0    0    0   .6533754   1  
  7.      1       7       17.4    1    0    0    0    0   .6533754   1  
  8.      1       8      14.62    1    0    0    0    0   .6533754   1  
  9.      2       1      18.43    0    1    0    0    0   .6369686   1  
 10.      2       2      18.76    0    1    0    0    0   .6369686   1  
 11.      2       3      20.12    0    1    0    0    0   .6369686   1  
 12.      2       4      19.11    0    1    0    0    0   .6369686   1  
 13.      2       5      19.81    0    1    0    0    0   .6369686   1  
 14.      2       6      18.43    0    1    0    0    0   .6369686   1  
 15.      2       7      17.16    0    1    0    0    0   .6369686   1  
 16.      2       8       16.4    0    1    0    0    0   .6369686   1  
 17.      3       1      16.95    0    0    1    0    0   .1617233   1  
 18.      3       2      12.28    0    0    1    0    0   .1617233   1  
 19.      3       3         12    0    0    1    0    0   .1617233   1  
 20.      3       4      13.18    0    0    1    0    0   .1617233   1  
 21.      3       5      14.99    0    0    1    0    0   .1617233   1  
 22.      3       6      15.76    0    0    1    0    0   .1617233   1  
 23.      3       7      19.35    0    0    1    0    0   .1617233   1  
 24.      3       8      15.52    0    0    1    0    0   .1617233   1  
 25.      4       1       8.59    0    0    0    1    0   1.499608   1  
 26.      4       2       10.9    0    0    0    1    0   1.499608   1  
 27.      4       3        8.6    0    0    0    1    0   1.499608   1  
 28.      4       4      10.13    0    0    0    1    0   1.499608   1  
 29.      4       5      10.28    0    0    0    1    0   1.499608   1  
 30.      4       6       9.98    0    0    0    1    0   1.499608   1  
 31.      4       7       9.41    0    0    0    1    0   1.499608   1  
 32.      4       8      10.04    0    0    0    1    0   1.499608   1  
 33.      5       1      11.55    0    0    0    0    1   1.689189   1  
 34.      5       2      13.36    0    0    0    0    1   1.689189   1  
 35.      5       3      13.64    0    0    0    0    1   1.689189   1  
 36.      5       4      12.16    0    0    0    0    1   1.689189   1  
 37.      5       5      11.62    0    0    0    0    1   1.689189   1  
 38.      5       6      12.39    0    0    0    0    1   1.689189   1  
 39.      5       7      12.05    0    0    0    0    1   1.689189   1  
 40.      5       8      11.95    0    0    0    0    1   1.689189   1 

 Fig. 18.7a, p.771 using the same data as previous example.

regress strength x1 x2 x3 x4 x5 [aweight=weight], noconstant
(sum of wgt is   3.7127e+01)

      Source |       SS       df       MS              Number of obs =      40
-------------+------------------------------           F(  5,    35) = 1295.90
       Model |   6980.9173     5  1396.18346           Prob > F      =  0.0000
    Residual |   37.708488    35  1.07738537           R-squared     =  0.9946
-------------+------------------------------           Adj R-squared =  0.9939
       Total |  7018.62579    40  175.465645           Root MSE      =   1.038

------------------------------------------------------------------------------
    strength |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |      15.42   .4373948    35.25   0.000     14.53204    16.30796
          x2 |    18.5275   .4429921    41.82   0.000     17.62818    19.42682
          x3 |   15.00375   .8791615    17.07   0.000     13.21896    16.78854
          x4 |    9.74125   .2887128    33.74   0.000     9.155132    10.32737
          x5 |      12.34   .2720294    45.36   0.000     11.78775    12.89225
------------------------------------------------------------------------------
Fig. 18.7b, p.771.
regress strength x [aweight=weight], noconstant
(sum of wgt is   3.7127e+01)

      Source |       SS       df       MS              Number of obs =      40
-------------+------------------------------           F(  1,    39) =  668.28
       Model |  6631.61482     1  6631.61482           Prob > F      =  0.0000
    Residual |  387.010975    39  9.92335833           R-squared     =  0.9449
-------------+------------------------------           Adj R-squared =  0.9434
       Total |  7018.62579    40  175.465645           Root MSE      =  3.1501

------------------------------------------------------------------------------
    strength |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   12.87596   .4980803    25.85   0.000      11.8685    13.88342
------------------------------------------------------------------------------

Inputting the Servo data and obtaining the mean and variance of time by location, table 18.5, p. 774.
input time location interval 
    4.41  1  1  
  100.65  1  2  
   14.45  1  3  
   47.13  1  4 
   85.21  1  5  
    8.24  2  1 
   81.16  2  2  
    7.35  2  3 
   12.29  2  4 
    1.61  2  5 
  106.19  3  1  
   33.83  3  2 
   78.88  3  3 
  342.81  3  4 
   44.33  3  5 
end
egen rank = rank(time)
table interval location , contents(mean time mean rank)
sort location
tabstat time , statistics(mean var ) by(location)
tabstat rank, statistics(mean var ) by(location)
----------------------------------
          |        location       
 interval |      1       2       3
----------+-----------------------
        1 |   4.41    8.24  106.19
          |      2       4      14
          | 
        2 | 100.65   81.16   33.83
          |     13      11       7
          | 
        3 |  14.45    7.35   78.88
          |      6       3      10
          | 
        4 |  47.13   12.29  342.81
          |      9       5      15
          | 
        5 |  85.21    1.61   44.33
          |     12       1       8
----------------------------------
Summary for variables: time
     by categories of: location 

location |      mean  variance
---------+--------------------
       1 |     50.37  1788.742
       2 |     22.13  1103.454
       3 |   121.208  16167.45
---------+--------------------
   Total |  64.56933  7306.561
------------------------------
Summary for variables: rank
     by categories of: location 

location |      mean  variance
---------+--------------------
       1 |       8.4      20.3
       2 |       4.8      14.2
       3 |      10.8      12.7
---------+--------------------
   Total |         8        20
------------------------------
Diagnostic statistics for determining the appropriate transformation of time, bottom of p. 773.
sort location
by location: egen sd = sd(time)
by location: egen mean= mean(time)
gen sqrt = (sd^2)/mean
gen inv = sd/mean
gen arcsinsqrt = sd/(mean^2)
----------------------------------------------------------
 location |     mean(sqrt)       mean(inv)  mean(arcsin~t)
----------+-----------------------------------------------
        1 |       35.51206        .8396571        .0166698
        2 |       49.86238        1.501052        .0678288
        3 |        133.386        1.049034        .0086548
----------------------------------------------------------

Table 18.6, p. 775.

means time
scalar k2 =  r(mean_g) 

capture drop myw
gen myw = .

foreach n of numlist 0/20 {
  local lambda = (`n'-10)/10
  scalar k1 = k2^(1-`lambda')/`lambda'
  if (`lambda' ==0) {
  quietly replace myw = k2*ln(time) 
  }
  else {
  quietly replace myw = k1*(time^`lambda' -1)
   } 
  quietly xi: reg myw i.location
  local rss_1000 = e(rss)/1000
  display  in yellow "`lambda'" _col(10) %8.1f   `rss_1000' 
 }
-1          203.5
-.9         137.7
-.8          95.1
-.7          67.1
-.6          48.7
-.5          36.5
-.4          28.3
-.3          22.8
-.2          19.2
-.1          17.0
0            15.7
.1           15.3
.2           15.6
.3           16.7
.4           18.7
.5           21.8
.6           26.4
.7           33.0
.8           42.6
.9           56.4
1            76.2
Figure 18.8a, p. 775.
regress time location
predict rloc, residuals
qnorm rloc

Figure 18.8b, p. 775.

gen lntime = ln(time)
regress lntime location
predict rtrans, residuals
qnorm rtrans

Kruskal Wallis test of the Servo data, p. 778-779.
Note: Use equation (18.29) on page 779 to get the F statistic for the rank test.

kwallis time, by(location)
Kruskal-Wallis equality-of-populations rank test

  +---------------------------+
  | location | Obs | Rank Sum |
  |----------+-----+----------|
  |        1 |   5 |    42.00 |
  |        2 |   5 |    24.00 |
  |        3 |   5 |    54.00 |
  +---------------------------+

chi-squared =     4.560 with 2 d.f.
probability =     0.1023

chi-squared with ties =     4.560 with 2 d.f.
probability =     0.1023

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.