UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Textbook Examples
Applied Linear Statistical Models by Neter, Kutner, et. al.
Chapter 25: Analysis of Covariance

Inputting the Cracker Promotion data, p. 1020.
data cracker;
   input y x treat store;
cards;
  38  21  1  1
  39  26  1  2
  36  22  1  3
  45  28  1  4
  33  19  1  5
  43  34  2  1
  38  26  2  2
  38  29  2  3
  27  18  2  4
  34  25  2  5
  24  23  3  1
  32  29  3  2
  31  30  3  3
  21  16  3  4
  28  29  3  5
;
run;
Fig. 25.5, p. 1021.
Note: In order to graph all three treatments at once we need to create three variables where each is equal to y if treat is equal to a specific treatment but missing otherwise.
data cplot;
  set cracker;
  if treat=1 then treat1 = y;
  else if treat=2 then treat2=y;
  else treat3=y;
run;
goptions reset=all;
symbol1 c=blue v=dot h=.8;
symbol2 c=red v=dot h=.8;
symbol3 c=green v=dot h=.8;
axis1 order=(10 to 50 by 10) label=(a=90 'Sales in Promotion Period');
axis2 order=(15 to 35 by 5) label=('Sales in Preceding Period');
legend1 label=none value=(height=1 font=swiss 'Treatment 1' 'Treatment 2' 'Treatment 3' ) 
        position=(bottom right inside) mode=share cborder=black;
proc gplot data=cplot;
  plot (treat1 treat2 treat3)*x/overlay legend=legend1 vaxis=axis1 haxis=axis2;
run;
quit;
Creating the indicator and interaction variables for the Cracker data set. First we need to calculate the overall mean which will be used to generate the x variable (x = X-mean), table 25.2, p. 1021.
proc sql;
  create table cdummy as
  select *, x-mean(x) as littlex
  from cracker;
quit;
data cdummy;
   set cdummy;
   I1 = 0;
   if treat=1 then I1=1;
     else if treat=3 then I1=-1;
   I2=0;
   if treat=2 then I2=1;
     else if treat=3 then I2=-1;
   I1x = I1*littlex;
   I2x = I2*littlex;
run;
proc print data=cdummy;
run; 
Obs     y     x    treat    store    littlex    I1    I2    I1x    I2x

  1    38    21      1        1         -4       1     0     -4      0
  2    39    26      1        2          1       1     0      1      0
  3    36    22      1        3         -3       1     0     -3      0
  4    45    28      1        4          3       1     0      3      0
  5    33    19      1        5         -6       1     0     -6      0
  6    43    34      2        1          9       0     1      0      9
  7    38    26      2        2          1       0     1      0      1
  8    38    29      2        3          4       0     1      0      4
  9    27    18      2        4         -7       0     1      0     -7
 10    34    25      2        5          0       0     1      0      0
 11    24    23      3        1         -2      -1    -1      2      2
 12    32    29      3        2          4      -1    -1     -4     -4
 13    31    30      3        3          5      -1    -1     -5     -5
 14    21    16      3        4         -9      -1    -1      9      9
 15    28    29      3        5          4      -1    -1     -4     -4
Regressing Y on littlex, I1 and I2, table 25.3, p. 1022. Testing for treatment effect, p. 1023-1024.
proc reg data=cdummy outest=outregc covout;
  model y = littlex I1 I2;
  treatment_effect: test I1=I2=0;
  output out=residualc r=resid;
run;
quit;
proc print data=outregc;
  where _type_ = 'COV';
  var intercept littlex I1 I2;
run;
The REG Procedure
Model: MODEL1
Dependent Variable: y

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3      607.82869      202.60956      57.78    <.0001
Error                    11       38.57131        3.50648
Corrected Total          14      646.40000

Root MSE              1.87256    R-Square     0.9403
Dependent Mean       33.80000    Adj R-Sq     0.9241
Coeff Var             5.54012

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1       33.80000        0.48349      69.91      <.0001
littlex       1        0.89856        0.10258       8.76      <.0001
I1            1        6.01741        0.70826       8.50      <.0001
I2            1        0.94202        0.69868       1.35      0.2047

The REG Procedure
Model: MODEL1

 Test treatment_effect Results for Dependent Variable y

                                Mean
Source             DF         Square    F Value    Pr > F
Numerator           2      208.57546      59.48    <.0001
Denominator        11        3.50648
Obs    Intercept     littlex        I1          I2

 2      0.23377      0.000000     0.00000     0.00000
 3      0.00000      0.010524     0.01894    -0.01473
 4      0.00000      0.018943     0.50163    -0.26029
 5      0.00000     -0.014733    -0.26029     0.48816
Fig. 25.6b, p. 1023.
goptions reset=all;
symbol c=blue v=dot h=.8;
proc capability data=residualc noprint;
  qqplot resid;
run;
Reduced model--without I1 and I2, table 25.4, p. 1023.
proc reg data=cdummy ;
  model y = littlex ;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     1      190.67778      190.67778       5.44    0.0364
Error                    13      455.72222       35.05556
Corrected Total          14      646.40000

Root MSE              5.92077    R-Square     0.2950
Dependent Mean       33.80000    Adj R-Sq     0.2408
Coeff Var            17.51708
                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1       33.80000        1.52874      22.11      <.0001
littlex       1        0.72778        0.31205       2.33      0.0364
Estimation of treatment effects, pair-wise comparisons using proc glm, p. 1024.
ods output Estimates=temp  OverallANOVA=anova;
proc glm data=cracker;
  class treat;
  model y = x treat;
  estimate 'treat1 v treat2' treat 1 -1 0;
  estimate 'treat1 v treat3' treat 1 0 -1;
  estimate 'treat2 v treat3' treat 0 1 -1;
run;
quit;
data _null_;
  set anova;
  if source='Model' then call symput('dfmodel', DF);
  if source='Error' then call symput('dferr', DF);
run;
%put check macro variables in log: &dfmodel and &dferr;
data temp;
  set temp;
  drop dependent tvalue probt;
  S2 =  (&dfmodel - 1)*finv(.95, (&dfmodel - 1), &dferr);
  S = sqrt(S2);
  lower = estimate - S*stderr;
  upper = estimate + S*stderr;
run;
proc print data=temp;
run;
The GLM Procedure

   Class Level Information

Class         Levels    Values
treat              3    1 2 3

Number of observations    15
The GLM Procedure
Dependent Variable: y

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        3     607.8286915     202.6095638      57.78    <.0001
Error                       11      38.5713085       3.5064826
Corrected Total             14     646.4000000

R-Square     Coeff Var      Root MSE        y Mean
0.940329      5.540120      1.872560      33.80000

Source                      DF       Type I SS     Mean Square    F Value    Pr > F
x                            1     190.6777778     190.6777778      54.38    <.0001
treat                        2     417.1509137     208.5754568      59.48    <.0001
Source                      DF     Type III SS     Mean Square    F Value    Pr > F
x                            1     269.0286915     269.0286915      76.72    <.0001
treat                        2     417.1509137     208.5754568      59.48    <.0001

                                            Standard
Parameter                   Estimate           Error    t Value    Pr > |t|
treat1 v treat2            5.0753902      1.22896513       4.13      0.0017
treat1 v treat3           12.9768307      1.20562330      10.76      <.0001
treat2 v treat3            7.9014406      1.18874585       6.65      <.0001

Obs      Parameter          Estimate         StdErr      S2        S       lower     upper

 1    treat1 v treat2      5.0753902     1.22896513   7.96460   2.82216   1.60705    8.5437
 2    treat1 v treat3     12.9768307     1.20562330   7.96460   2.82216   9.57437   16.3793
 3    treat2 v treat3      7.9014406     1.18874585   7.96460   2.82216   4.54661   11.2563
Estimating the mean response for each treatment group when X is at its mean (X=25), p. 1026.
Note: The output in SAS includes the estimate and the standard error of the estimate which is the square root of the variance.
proc means data=cracker mean;
  var x;
run;
proc glm data=cracker;
  class treat;
  model y = x treat;
  estimate 'treat1 at X=25' intercept 1 treat 1 0 0 x 25;
  estimate 'treat2 at X=25' intercept 1 treat 0 1 0 x 25;
  estimate 'treat3 at X=25' intercept 1 treat 0 0 1 x 25;
run;
quit;
The MEANS Procedure
Analysis Variable : x

        Mean
------------
  25.0000000
------------

The GLM Procedure

   Class Level Information

Class         Levels    Values
treat              3    1 2 3

Number of observations    15
The GLM Procedure
Dependent Variable: y
                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        3     607.8286915     202.6095638      57.78    <.0001
Error                       11      38.5713085       3.5064826
Corrected Total             14     646.4000000

R-Square     Coeff Var      Root MSE        y Mean
0.940329      5.540120      1.872560      33.80000

Source                      DF       Type I SS     Mean Square    F Value    Pr > F
x                            1     190.6777778     190.6777778      54.38    <.0001
treat                        2     417.1509137     208.5754568      59.48    <.0001
Source                      DF     Type III SS     Mean Square    F Value    Pr > F
x                            1     269.0286915     269.0286915      76.72    <.0001
treat                        2     417.1509137     208.5754568      59.48    <.0001

                                            Standard
Parameter                   Estimate           Error    t Value    Pr > |t|
treat1 at X=25            39.8174070      0.85755068      46.43      <.0001
treat2 at X=25            34.7420168      0.84966045      40.89      <.0001
treat3 at X=25            26.8405762      0.83843921      32.01      <.0001
Table 25.5 and testing for parallel slopes, in other words, testing to see if the interactions are significant, p. 1027.
proc reg data=cdummy;
  model y = littlex I1 I2 I1x I2x;
  interaction: test I1x=I2x=0;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     5      614.87916      122.97583      35.11    <.0001
Error                     9       31.52084        3.50232
Corrected Total          14      646.40000

Root MSE              1.87145    R-Square     0.9512
Dependent Mean       33.80000    Adj R-Sq     0.9241
Coeff Var             5.53683

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1       33.89433        0.51234      66.16      <.0001
littlex       1        0.93874        0.11267       8.33      <.0001
I1            1        6.26990        0.75167       8.34      <.0001
I2            1        0.71791        0.71600       1.00      0.3422
I1x           1        0.15251        0.18438       0.83      0.4296
I2x           1        0.05252        0.14561       0.36      0.7267

The REG Procedure
Model: MODEL1

    Test interaction Results for Dependent Variable y

                                Mean
Source             DF         Square    F Value    Pr > F
Numerator           2        3.52524       1.01    0.4032
Denominator         9        3.50232
Inputting the Salable Flowers data set, table 25.6, p. 1029.
data flowers;
  input y x a b rep;
  label y = 'yield'
        x = 'plot size'
	a = 'variety'
	b = 'moisture';
cards;
  98  15  1  1  1
  60   4  1  1  2
  77   7  1  1  3
  80   9  1  1  4
  95  14  1  1  5
  64   5  1  1  6
  55   4  2  1  1
  60   5  2  1  2
  75   8  2  1  3
  65   7  2  1  4
  87  13  2  1  5
  78  11  2  1  6
  71  10  1  2  1
  80  12  1  2  2
  86  14  1  2  3
  82  13  1  2  4
  46   2  1  2  5
  55   3  1  2  6
  76  11  2  2  1
  68  10  2  2  2
  43   2  2  2  3
  47   3  2  2  4
  62   7  2  2  5
  70   9  2  2  6
;
run;
Fig. 25.7, p. 1030.
data fplot;
  set flowers;
  if a=1 and b=1 then a1b1 = y;
  if a=1 and b=2 then a1b2 = y;
  if a=2 and b=1 then a2b1 = y;
  if a=2 and b=2 then a2b2 = y;
run;
 
symbol1 v=dot c=blue h=.8;
symbol2 v=circle c=red h=.8;
symbol3 v=square c=green h=.8;
symbol4 v=plus c=purple h=.8;
proc gplot data=fplot;
  plot (a1b1 a2b2 a1b2 a2b1)*x/overlay;
run;
quit;
Generating the variable for X centered at its mean, the indicator variables and their interactions, p. 1028.
proc sql;
  create table fdummy as
  select *, x - mean(x) as littlex
  from  flowers;
quit;
data fdummy;
  set fdummy;
  I1 = 1;
  if a=2 then I1=-1;
  I2 = 1;
  if b=2 then I2=-1;
  I12 = I1*I2;
run;
Table 25.7, regression output and sums of squares, p. 1030 and the test of the interaction, p. 1031.
proc reg data=fdummy;
  model y = littlex I1 I2 I12/ ss2;
  interaction: test I12=0;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: y yield

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     4     4966.51882     1241.62970     197.45    <.0001
Error                    19      119.48118        6.28848
Corrected Total          23     5086.00000

Root MSE              2.50768    R-Square     0.9765
Dependent Mean       70.00000    Adj R-Sq     0.9716
Coeff Var             3.58241

                                      Parameter Estimates

                                  Parameter       Standard
Variable     Label        DF       Estimate          Error    t Value    Pr > |t|     Type II SS
Intercept    Intercept     1       70.00000        0.51188     136.75      <.0001         117600
littlex                    1        3.27688        0.13002      25.20      <.0001     3994.51882
I1                         1        2.04234        0.52108       3.92      0.0009       96.60183
I2                         1        3.68078        0.51291       7.18      <.0001      323.84947
I12                        1        0.81922        0.51291       1.60      0.1267       16.04224

The REG Procedure
Model: MODEL1

    Test interaction Results for Dependent Variable y

                                Mean
Source             DF         Square    F Value    Pr > F
Numerator           1       16.04224       2.55    0.1267
Denominator        19        6.28848
Fig. 25.8, estimated treatment means plot (x=0).
ods listing close;
ods output  LSMeans=means;
proc glm data=flowers;
  class a b;
  model y = x a b a*b;
  estimate 'Factor A effect' a 1 -1;
  estimate 'Factor B effect' b 1 -1;
  lsmeans a*b;
run;
quit;
ods output close;
ods listing;
data means;
  set means;
  a1=a+0;
  if b=1 then b1=yLSMean;
  if b=2 then b2=yLSMean;
run;
filename outfile 'c:\sas2htm\alsm25_4.gif';
goptions gsfname=outfile dev=gif373;
symbol1 v=dot c=blue i=join;
symbol2 v=circle c=red i=join;
axis1 offset = (2, 2) label=('Variety') ;
axis2 order=(40 to 100 by 20) label=(a=90 'Number of Flowers');
proc gplot data=means;
  plot (b1 b2)*a / overlay haxis=axis1 vaxis=axis2;
run;
quit;
Testing the Factor effects using proc glm, p. 1031-1032.
ods output Estimates=temp  OverallANOVA=anova;
proc glm data=flowers;
  class a b;
  model y = x a b a*b;
  estimate 'Factor A effect' a 1 -1;
  estimate 'Factor B effect' b 1 -1;
run;
quit;
data _null_;
  set anova;
  if source='Error' then call symput('dferr', DF);
run;
%put check macro variables in log: &dferr;
data temp;
  set temp;
  drop dependent tvalue probt;
  t = tinv( (1 - .05/(2*2) ), &dferr);
  lower = estimate - t*stderr;
  upper = estimate + t*stderr;
run;
proc print data=temp;
run;
The GLM Procedure

   Class Level Information

Class         Levels    Values
a                  2    1 2
b                  2    1 2


Number of observations    24
The GLM Procedure
Dependent Variable: y   yield
                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        4     4966.518817     1241.629704     197.45    <.0001
Error                       19      119.481183        6.288483
Corrected Total             23     5086.000000

R-Square     Coeff Var      Root MSE        y Mean
0.976508      3.582407      2.507685      70.00000

Source                      DF       Type I SS     Mean Square    F Value    Pr > F
x                            1     4532.635779     4532.635779     720.78    <.0001
a                            1       93.406888       93.406888      14.85    0.0011
b                            1      324.433906      324.433906      51.59    <.0001
a*b                          1       16.042244       16.042244       2.55    0.1267

Source                      DF     Type III SS     Mean Square    F Value    Pr > F
x                            1     3994.518817     3994.518817     635.21    <.0001
a                            1       96.601826       96.601826      15.36    0.0009
b                            1      323.849473      323.849473      51.50    <.0001
a*b                          1       16.042244       16.042244       2.55    0.1267

                                            Standard
Parameter                   Estimate           Error    t Value    Pr > |t|
Factor A effect           4.08467742      1.04216876       3.92      0.0009
Factor B effect           7.36155914      1.02582000       7.18      <.0001
Obs       Parameter           Estimate          StdErr       t        lower      upper

 1     Factor A effect      4.08467742      1.04216876    2.43344    1.54862    6.62073
 2     Factor B effect      7.36155914      1.02582000    2.43344    4.86529    9.85783
Using a Y - X as the response variable in ANOVA for the cracker data set, p. 1033.
Note: The MSE for both models are very close and the slope of the regression line in the second model ( y = x treat) has a slope of 0.89855942 (the coefficient for x).
data difference;
  set cracker;
  diff = y - x;
run;
proc glm data=difference;
  class treat;
  model diff = treat;
run;
quit;
proc reg data=cdummy;
  model y = littlex I1 I2;
run;
quit;
The GLM Procedure

   Class Level Information

Class         Levels    Values
treat              3    1 2 3

Number of observations    15
The GLM Procedure
Dependent Variable: diff

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        2     440.4000000     220.2000000      62.91    <.0001
Error                       12      42.0000000       3.5000000
Corrected Total             14     482.4000000

R-Square     Coeff Var      Root MSE     diff Mean
0.912935      21.25942      1.870829      8.800000

Source                      DF       Type I SS     Mean Square    F Value    Pr > F
treat                        2     440.4000000     220.2000000      62.91    <.0001

Source                      DF     Type III SS     Mean Square    F Value    Pr > F
treat                        2     440.4000000     220.2000000      62.91    <.0001

The REG Procedure
Model: MODEL1
Dependent Variable: y

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3      607.82869      202.60956      57.78    <.0001
Error                    11       38.57131        3.50648
Corrected Total          14      646.40000

Root MSE              1.87256    R-Square     0.9403
Dependent Mean       33.80000    Adj R-Sq     0.9241
Coeff Var             5.54012

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1       33.80000        0.48349      69.91      <.0001
littlex       1        0.89856        0.10258       8.76      <.0001
I1            1        6.01741        0.70826       8.50      <.0001
I2            1        0.94202        0.69868       1.35      0.2047

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California