SAS Textbook Examples
Applied Linear Statistical Models by Neter, Kutner, et. al.
Chapter 29: Logistic Regression, Poisson Regression and Generalized Linear Models

NOTE: This page has been delinked.  It is no longer being maintained, and information on this page may be out of date.

Inputting the Wine Judging Data, table 29.2, p. 1169.
data wine;
  input rating judge wine;
cards;
  20  1  1
  24  1  2
  28  1  3
  28  1  4
  15  2  1
  18  2  2
  23  2  3
  24  2  4
  18  3  1
  19  3  2
  24  3  3
  23  3  4
  26  4  1
  26  4  2
  30  4  3
  30  4  4
  22  5  1
  24  5  2
  28  5  3
  26  5  4
  19  6  1
  21  6  2
  27  6  3
  25  6  4
;
run;
ANOVA table of the wine data, table 29.3, p. 1171,  including a test of the main effect of wine, p. 1170.
From the means statement we obtain the factor means and the grand mean is part of the standard output of proc glm, table 29.2, p. 1169.
proc glm data=wine;
  class wine judge;
  model rating = wine judge;
  means judge wine;
run;
quit;
The GLM Procedure

      Class Level Information

Class         Levels    Values
wine               4    1 2 3 4
judge              6    1 2 3 4 5 6

Number of observations    24
The GLM Procedure
Dependent Variable: rating
                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        8     357.3333333      44.6666667      41.87    <.0001
Error                       15      16.0000000       1.0666667
Corrected Total             23     373.3333333

R-Square     Coeff Var      Root MSE    rating Mean
0.957143      4.363925      1.032796       23.66667

Source                      DF       Type I SS     Mean Square    F Value    Pr > F
wine                         3     184.0000000      61.3333333      57.50    <.0001
judge                        5     173.3333333      34.6666667      32.50    <.0001

Source                      DF     Type III SS     Mean Square    F Value    Pr > F
wine                         3     184.0000000      61.3333333      57.50    <.0001
judge                        5     173.3333333      34.6666667      32.50    <.0001

The GLM Procedure

Level of           ------------rating-----------
judge        N             Mean          Std Dev
1            4       25.0000000       3.82970843
2            4       20.0000000       4.24264069
3            4       21.0000000       2.94392029
4            4       28.0000000       2.30940108
5            4       25.0000000       2.58198890
6            4       23.0000000       3.65148372
Level of           ------------rating-----------
wine         N             Mean          Std Dev
1            6       20.0000000       3.74165739
2            6       22.0000000       3.16227766
3            6       26.6666667       2.65832027
4            6       26.0000000       2.60768096
Diagnostic residual plots for the wine data set, fig. 29.3, p. 1173.
Note: In the normal probability plot proc capability shows a dot for each observation instead of writing a number for the total number of observations as in the book.
proc glm data=wine noprint;
  class wine judge;
  model rating = wine judge;
  output out=resid r=resid;
run;
quit;
symbol1 c=blue v=dot h=.8;
proc capability data=resid noprint;
  qqplot resid;
run;
data resid;
  set resid;
  if judge=1 then resid1=resid;
  if judge=2 then resid2=resid;
  if judge=3 then resid3=resid;
  if judge=4 then resid4=resid;
  if judge=5 then resid5=resid;
  if judge=6 then resid6=resid;
run;
axis1 order=(-2 to 2 by 1);
axis2 order=(3 2 1 4);
axis3 order=(1 3 2 4);
axis4 order=(3 2 4 1);
axis5 order=(2 3 1 4);
proc gplot data=resid;
  plot resid1*wine / vref=0 vaxis=axis1 haxis=axis2; 
  plot resid2*wine / vref=0 vaxis=axis1 haxis=axis3;
  plot resid3*wine / vref=0 vaxis=axis1 haxis=axis4;
  plot resid4*wine / vref=0 vaxis=axis1 haxis=axis5;
  plot resid5*wine / vref=0 vaxis=axis1 haxis=axis4;
  plot resid6*wine / vref=0 vaxis=axis1 haxis=axis3;
run;
quit;
It is the lsmeans statement with a pdiff option that provides us with all possible pair-wise comparisons of the mean rating of the wines, p. 1174.
proc glm data=wine ;
  class wine judge;
  model rating = wine judge ;
  lsmeans wine / pdiff adjust=tukey cl;
run;
quit;
<output omitted>

The GLM Procedure
Least Squares Means
Adjustment for Multiple Comparisons: Tukey

              rating      LSMEAN
wine          LSMEAN      Number
1         20.0000000           1
2         22.0000000           2
3         26.6666667           3
4         26.0000000           4

            Least Squares Means for effect wine
            Pr > |t| for H0: LSMean(i)=LSMean(j)

                 Dependent Variable: rating

i/j              1             2             3             4
   1                      0.0202        <.0001        <.0001
   2        0.0202                      <.0001        <.0001
   3        <.0001        <.0001                      0.6844
   4        <.0001        <.0001        0.6844
   
              rating
wine          LSMEAN      95% Confidence Limits
1          20.000000       19.101302    20.898698
2          22.000000       21.101302    22.898698
3          26.666667       25.767969    27.565365
4          26.000000       25.101302    26.898698

        Least Squares Means for Effect wine

            Difference         Simultaneous 95%
               Between      Confidence Limits for
i    j           Means       LSMean(i)-LSMean(j)
1    2       -2.000000       -3.718582    -0.281418
1    3       -6.666667       -8.385248    -4.948085
1    4       -6.000000       -7.718582    -4.281418
2    3       -4.666667       -6.385248    -2.948085
2    4       -4.000000       -5.718582    -2.281418
3    4        0.666667       -1.051915     2.385248
Inputting the Coffee Sweeteners data, table 29.5, p. 1175.
data sweet; 
  input rank subject sweet;
cards;
  5  1  1
  1  1  2
  2  1  3
  4  1  4
  3  1  5
  4  2  1
  2  2  2
  1  2  3
  5  2  4
  3  2  5
  3  3  1
  2  3  2
  1  3  3
  4  3  4
  5  3  5
  5  4  1
  2  4  2
  3  4  3
  4  4  4
  1  4  5
  4  5  1
  1  5  2
  2  5  3
  3  5  4
  5  5  5
  4  6  1
  1  6  2
  3  6  3
  5  6  4
  2  6  5
;
run;
Calculating the mean score for each sweetener, table 29.5, p. 1175.
proc means data=sweet mean;
  class sweet;
  var rank;
run;
The MEANS Procedure

     Analysis Variable : rank

                  N
       sweet    Obs            Mean
-----------------------------------
           1      6       4.1666667
           2      6       1.5000000
           3      6       2.0000000
           4      6       4.1666667
           5      6       3.1666667
-----------------------------------
Nonparametric F-test, p. 1175. The lsmeans statement with the pdiff option provides us with all pair-wise comparisons of the means of all the sweeteners, the cl option is necessary in order to see the differences between the means, p. 1176.
proc glm data=sweet;
  class sweet subject;
  model rank = sweet subject;
  lsmeans sweet / pdiff adjust=bon alpha=.2 cl;
run;
quit;
The GLM Procedure

      Class Level Information

Class         Levels    Values
sweet              5    1 2 3 4 5
subject            6    1 2 3 4 5 6

Number of observations    30

The GLM Procedure
Dependent Variable: rank
                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        9     36.00000000      4.00000000       3.33    0.0119
Error                       20     24.00000000      1.20000000
Corrected Total             29     60.00000000

R-Square     Coeff Var      Root MSE     rank Mean
0.600000      36.51484      1.095445      3.000000

Source                      DF       Type I SS     Mean Square    F Value    Pr > F
sweet                        4     36.00000000      9.00000000       7.50    0.0007
subject                      5      0.00000000      0.00000000       0.00    1.0000
Source                      DF     Type III SS     Mean Square    F Value    Pr > F
sweet                        4     36.00000000      9.00000000       7.50    0.0007
subject                      5      0.00000000      0.00000000       0.00    1.0000

The GLM Procedure
Least Squares Means
Adjustment for Multiple Comparisons: Bonferroni

                           LSMEAN
sweet     rank LSMEAN      Number
1          4.16666667           1
2          1.50000000           2
3          2.00000000           3
4          4.16666667           4
5          3.16666667           5

                   Least Squares Means for effect sweet
                   Pr > |t| for H0: LSMean(i)=LSMean(j)

                         Dependent Variable: rank

i/j              1             2             3             4             5
   1                      0.0042        0.0268        1.0000        1.0000
   2        0.0042                      1.0000        0.0042        0.1587
   3        0.0268        1.0000                      0.0268        0.7995
   4        1.0000        0.0042        0.0268                      1.0000
   5        1.0000        0.1587        0.7995        1.0000

sweet     rank LSMEAN      80% Confidence Limits
1            4.166667        3.573956     4.759377
2            1.500000        0.907290     2.092710
3            2.000000        1.407290     2.592710
4            4.166667        3.573956     4.759377
5            3.166667        2.573956     3.759377

        Least Squares Means for Effect sweet

            Difference         Simultaneous 80%
               Between      Confidence Limits for
i    j           Means       LSMean(i)-LSMean(j)
1    2        2.666667        1.067834     4.265500
1    3        2.166667        0.567834     3.765500
1    4               0       -1.598833     1.598833
1    5        1.000000       -0.598833     2.598833
2    3       -0.500000       -2.098833     1.098833
2    4       -2.666667       -4.265500    -1.067834
2    5       -1.666667       -3.265500    -0.067834
3    4       -2.166667       -3.765500    -0.567834
3    5       -1.166667       -2.765500     0.432166
4    5        1.000000       -0.598833     2.598833
Inputting the Blood Flow data, table 29.7, p. 1181.
data flow;
  input score subject a b;
cards;
   2   1  1  1
  -1   2  1  1
   0   3  1  1
   3   4  1  1
   1   5  1  1
   2   6  1  1
  -2   7  1  1
   4   8  1  1
  -2   9  1  1
  -2  10  1  1
   2  11  1  1
  -1  12  1  1
  10   1  1  2
   8   2  1  2
  11   3  1  2
  15   4  1  2
   5   5  1  2
  12   6  1  2
  10   7  1  2
  16   8  1  2
   7   9  1  2
  10  10  1  2
   8  11  1  2
   8  12  1  2
   9   1  2  1
   6   2  2  1
   8   3  2  1
  11   4  2  1
   6   5  2  1
   9   6  2  1
   8   7  2  1
  12   8  2  1
   7   9  2  1
  10  10  2  1
  10  11  2  1
   6  12  2  1
  25   1  2  2
  21   2  2  2
  24   3  2  2
  31   4  2  2
  20   5  2  2
  27   6  2  2
  22   7  2  2
  30   8  2  2
  24   9  2  2
  28  10  2  2
  25  11  2  2
  23  12  2  2
;
run;
ANOVA table for blood flow data, fig. 29.5, p. 1182.
The lsmeans statement with the pdiff and adjust=bon options provides all the pair-wise differences using Bonferroni adjustment, p. 1184.
Note: The differences are the reverse of those in the book with the result that they and their confidence intervals are the additive inverses of those in the book. Furthermore, SAS by default outputs all the pair-wise differences not just those shown in the book.
proc glm data=flow;
  class a b subject;
  model score = subject a b a*b / ss3;
  lsmeans a*b /pdiff adjust=bon cl;
run;
quit;
The GLM Procedure

              Class Level Information

Class         Levels    Values
a                  2    1 2
b                  2    1 2

subject           12    1 2 3 4 5 6 7 8 9 10 11 12

Number of observations    48
The GLM Procedure
Dependent Variable: score
                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                       14     4020.500000      287.178571     122.28    <.0001
Error                       33       77.500000        2.348485
Corrected Total             47     4098.000000

R-Square     Coeff Var      Root MSE    score Mean
0.981088      13.93161      1.532477      11.00000

Source                      DF     Type III SS     Mean Square    F Value    Pr > F
subject                     11      258.500000       23.500000      10.01    <.0001
a                            1     1587.000000     1587.000000     675.75    <.0001
b                            1     2028.000000     2028.000000     863.54    <.0001
a*b                          1      147.000000      147.000000      62.59    <.0001

The GLM Procedure
Least Squares Means
Adjustment for Multiple Comparisons: Bonferroni

                            LSMEAN
a    b    score LSMEAN      Number
1    1       0.5000000           1
1    2      10.0000000           2
2    1       8.5000000           3
2    2      25.0000000           4

             Least Squares Means for effect a*b
            Pr > |t| for H0: LSMean(i)=LSMean(j)

                 Dependent Variable: score

i/j              1             2             3             4
   1                      <.0001        <.0001        <.0001
   2        <.0001                      0.1339        <.0001
   3        <.0001        0.1339                      <.0001
   4        <.0001        <.0001        <.0001
   
a    b    score LSMEAN      95% Confidence Limits
1    1        0.500000       -0.400045     1.400045
1    2       10.000000        9.099955    10.900045
2    1        8.500000        7.599955     9.400045
2    2       25.000000       24.099955    25.900045

         Least Squares Means for Effect a*b

            Difference         Simultaneous 95%
               Between      Confidence Limits for
i    j           Means       LSMean(i)-LSMean(j)
1    2       -9.500000      -11.255997    -7.744003
1    3       -8.000000       -9.755997    -6.244003
1    4      -24.500000      -26.255997   -22.744003
2    3        1.500000       -0.255997     3.255997
2    4      -15.000000      -16.755997   -13.244003
3    4      -16.500000      -18.255997   -14.744003
Fig. 29.6, p. 1183.
data flow;
  set flow;
  if a=1 and b=1 then c=1;
  if a=1 and b=2 then c=2;
  if a=2 and b=1 then c=3;
  if a=2 and b=2 then c=4;
proc sql;
  create table temp as 
  select*, mean(score) as mean
  from flow
  group by c;
quit;
data plot;
  set temp;
  if b=1 then do;
    b1=score;
    mean1=mean;
    end;
  if b=2 then do;
    b2=score;
	mean2=mean;
	end;
run;
goptions reset=all;
 
symbol1 c=red v=circle;
symbol2 c=blue v=dot;
symbol3 c=red i=join v=circle;
symbol4 c=blue i=join v=dot;
axis1 label=(a=90 'Blood Flow') order=(-5 to 30 by 5);
axis2 value=('A1' 'A2') order=(1 2) offset=(3, 3) label=('');
legend1 label=none value=(height=.8 font=swiss 'B1' 'B2' 'Mean' 'Mean' ) 
        position=(bottom right inside) mode=share cborder=black across=2;
proc gplot data=plot;
 plot (b1 b2 mean1 mean2)*a/ overlay vaxis=axis1 haxis=axis2 legend=legend1;
run;
quit;
Inputting the Athletic Shoes Sales data, table 29.10, p. 1190.
data shoes;
  input sales subject a b;
  label subject = 'Test Market'
              a = 'Campaign'
	      b = 'Time';
cards;
    958  1  1  1
   1005  2  1  1
    351  3  1  1
    549  4  1  1
    730  5  1  1
   1047  1  1  2
   1122  2  1  2
    436  3  1  2
    632  4  1  2
    784  5  1  2
    933  1  1  3
    986  2  1  3
    339  3  1  3
    512  4  1  3
    707  5  1  3
    780  1  2  1
    229  2  2  1
    883  3  2  1
    624  4  2  1
    375  5  2  1
    897  1  2  2
    275  2  2  2
    964  3  2  2
    695  4  2  2
    436  5  2  2
    718  1  2  3
    202  2  2  3
    817  3  2  3
    599  4  2  3
    351  5  2  3
;
run;
Fig. 29.8, p. 1191.
data plot;
  set shoes;
  if subject=1 then s1=sales;
  if subject=2 then s2=sales;
  if subject=3 then s3=sales;
  if subject=4 then s4=sales;
  if subject=5 then s5=sales;
run;
symbol1 c=blue v=dot i=join;
symbol2 c=blue v=dot i=join;
symbol3 c=blue v=dot i=join;
symbol4 c=blue v=dot i=join;
symbol5 c=blue v=dot i=join;
axis1 label=(a=90 'Sales') offset=(1, 2) order=(300 to 1200 by 300);
proc gplot data=plot;
  by a;
  plot (s1 s2 s3 s4 s5)*b / overlay vaxis=axis1;
run;
quit;
Fig. 29.9, p. 1192 which includes the test of the interaction, p. 1191 and the test of the main effect of time periods (factor b). The test statement supplies the test of the main effects of campaign (factor a) where we have to specify that the denominator is the sums of squares of subject nested within campaign (factor a). The first lsmeans statement provides the means of sales for each level of factor a, table 29.9. The second lsmeans with the pdiff and adjust=Tukey options provides not only the means of sales for each level of b but also all the pair-wise differences and their confidence intervals using the Tukey procedure with alpha=.01, p. 1193.
proc glm data=shoes;
  class a b subject;
  model sales = a subject(a) b a*b;
  lsmeans a;
  lsmeans b / pdiff cl adjust=tukey alpha=.01;
  test h=a e=subject(a);
run;
quit;
The GLM Procedure

     Class Level Information

Class         Levels    Values
a                  2    1 2
b                  3    1 2 3
subject            5    1 2 3 4 5

Number of observations    30
The GLM Procedure
Dependent Variable: sales

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                       13     2069296.000      159176.615     444.67    <.0001
Error                       16        5727.467         357.967
Corrected Total             29     2075023.467

R-Square     Coeff Var      Root MSE    sales Mean
0.997240      2.847112      18.92001      664.5333

Source                      DF       Type I SS     Mean Square    F Value    Pr > F
a                            1      168150.533      168150.533     469.74    <.0001
subject(a)                   8     1833680.933      229210.117     640.31    <.0001
b                            2       67073.067       33536.533      93.69    <.0001
a*b                          2         391.467         195.733       0.55    0.5892

Source                      DF     Type III SS     Mean Square    F Value    Pr > F
a                            1      168150.533      168150.533     469.74    <.0001
subject(a)                   8     1833680.933      229210.117     640.31    <.0001
b                            2       67073.067       33536.533      93.69    <.0001
a*b                          2         391.467         195.733       0.55    0.5892

The GLM Procedure
Least Squares Means

a    sales LSMEAN
1      739.400000
2      589.666667

The GLM Procedure
Least Squares Means
Adjustment for Multiple Comparisons: Tukey

                       LSMEAN
b    sales LSMEAN      Number
1      648.400000           1
2      728.800000           2
3      616.400000           3

       Least Squares Means for effect b
     Pr > |t| for H0: LSMean(i)=LSMean(j)

          Dependent Variable: sales

i/j              1             2             3
   1                      <.0001        0.0044
   2        <.0001                      <.0001
   3        0.0044        <.0001
   
b    sales LSMEAN      99% Confidence Limits
1      648.400000      630.924871   665.875129
2      728.800000      711.324871   746.275129
3      616.400000      598.924871   633.875129

          Least Squares Means for Effect b

            Difference         Simultaneous 99%
               Between      Confidence Limits for
i    j           Means       LSMean(i)-LSMean(j)
1    2      -80.400000     -109.031863   -51.768137
1    3       32.000000        3.368137    60.631863
2    3      112.400000       83.768137   141.031863

The GLM Procedure

Dependent Variable: sales

     Tests of Hypotheses Using the Type III MS for subject(a) as an Error Term

Source                      DF     Type III SS     Mean Square    F Value    Pr > F
a                            1     168150.5333     168150.5333       0.73    0.4166

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.