UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Textbook Examples
An Introduction to Categorical Analysis by Alan Agresti
Chapter 4 - Generalized Linear Models

Inputting Snoring and Heart Disease data, table 4.1, p. 75.
data glm;
input snoring disease total;
cards;
0 24 1379 
2 35 638 
4 21 213 
5 30 254
; 
run;
Three models are used to estimate the probability of heart disease using an identity, logit and probit link function. The results of the model using the identity link function is at the bottom of p. 75; the results of the model using a logit link function is on p. 78; the results of the model using a probit link function is on p. 79.
data glm;
  set glm;
  id = _n_;
run;
proc genmod data=glm; 
 model disease/total = snoring / dist=bin link=identity ; 
 output out=temp1 p=pid;
run;
proc genmod data=glm; 
 model disease/total = snoring / dist=bin link=logit ; 
 output out=temp2 p=plogit;
run;
proc genmod data=glm; 
 model disease/total = snoring / dist=bin link=probit;
  output out=temp3 p=pprobit;
run;
The GENMOD Procedure

            Model Information
Data Set                          WORK.GLM
Distribution                      Binomial
Link Function                     Identity
Response Variable (Events)         disease
Response Variable (Trials)           total
Observations Used                        4
Number Of Events                       110
Number Of Trials                      2484
           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF
Deviance                   2          0.0692          0.0346
Scaled Deviance            2          0.0692          0.0346
Pearson Chi-Square         2          0.0688          0.0344
Scaled Pearson X2          2          0.0688          0.0344
Log Likelihood                     -417.4960

Algorithm converged.
                            Analysis Of Parameter Estimates

                               Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
Intercept     1      0.0172      0.0034      0.0105      0.0240      25.18        <.0001
snoring       1      0.0198      0.0028      0.0143      0.0253      49.97        <.0001
Scale         0      1.0000      0.0000      1.0000      1.0000
NOTE: The scale parameter was held fixed.

The GENMOD Procedure

            Model Information
Data Set                          WORK.GLM
Distribution                      Binomial
Link Function                        Logit
Response Variable (Events)         disease
Response Variable (Trials)           total
Observations Used                        4
Number Of Events                       110
Number Of Trials                      2484
           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF
Deviance                   2          2.8089          1.4045
Scaled Deviance            2          2.8089          1.4045
Pearson Chi-Square         2          2.8743          1.4372
Scaled Pearson X2          2          2.8743          1.4372
Log Likelihood                     -418.8658

Algorithm converged.
                            Analysis Of Parameter Estimates

                               Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
Intercept     1     -3.8662      0.1662     -4.1920     -3.5405     541.06        <.0001
snoring       1      0.3973      0.0500      0.2993      0.4954      63.12        <.0001
Scale         0      1.0000      0.0000      1.0000      1.0000
NOTE: The scale parameter was held fixed.

The GENMOD Procedure

            Model Information
Data Set                          WORK.GLM
Distribution                      Binomial
Link Function                       Probit
Response Variable (Events)         disease
Response Variable (Trials)           total
Observations Used                        4
Number Of Events                       110
Number Of Trials                      2484
           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF
Deviance                   2          1.8716          0.9358
Scaled Deviance            2          1.8716          0.9358
Pearson Chi-Square         2          1.9101          0.9550
Scaled Pearson X2          2          1.9101          0.9550
Log Likelihood                     -418.3971

Algorithm converged.
                            Analysis Of Parameter Estimates

                               Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
Intercept     1     -2.0606      0.0704     -2.1986     -1.9225     855.49        <.0001
snoring       1      0.1878      0.0236      0.1415      0.2341      63.14        <.0001
Scale         0      1.0000      0.0000      1.0000      1.0000
NOTE: The scale parameter was held fixed.
Table 4.1, p. 75.
data combo;
  merge temp1 temp2 temp3;
  by id;
  prop = disease/total;
  no = total - disease;
run;
proc print data = combo;
 var snoring disease no prop pid plogit pprobit;
run;
Obs    snoring    disease     no       prop       pid       plogit    pprobit

 1        0          24      1355    0.01740    0.01725    0.02051    0.01967
 2        2          35       603    0.05486    0.05680    0.04430    0.04599
 3        4          21       192    0.09859    0.09636    0.09305    0.09519
 4        5          30       224    0.11811    0.11614    0.13244    0.13100
Figure 4.1, p. 76.
symbol1 v=dot c=blue h=.4 i=join;
symbol2 v=dot c=red h=.4 i=spline;
symbol3 v=dot c=green h=.4 i=spline;
axis1 label=(angle=90 'Predicted Probabilities') order=(0 to .2 by .05);
legend label=none value=(h=1 font=swiss 'Linear' 'Logit' 'Probit')
       position=(bottom right inside) mode=share cborder=black;
proc gplot data = combo;
  plot (pid plogit pprobit)*snoring/overlay vaxis=axis1 legend=legend1;
run;
quit;
Inputting the Horseshoe Crab data, p. 82-83.
data crab;
 input color spine width satell weight;
 if satell>0 then y=1; 
 if satell=0 then y=0; 
 n=1; 
 weight = weight/1000; 
 color = color - 1;
 if color=4 then dark=0; 
 if color < 4 then dark=1;
cards;
3  3  28.3  8  3050
4  3  22.5  0  1550
2  1  26.0  9  2300
4  3  24.8  0  2100
4  3  26.0  4  2600
3  3  23.8  0  2100
2  1  26.5  0  2350
4  2  24.7  0  1900
3  1  23.7  0  1950
4  3  25.6  0  2150
4  3  24.3  0  2150
3  3  25.8  0  2650
3  3  28.2  11 3050
5  2  21.0  0  1850
3  1  26.0  14 2300
2  1  27.1  8  2950
3  3  25.2  1  2000
3  3  29.0  1  3000
5  3  24.7  0  2200
3  3  27.4  5  2700
3  2  23.2  4  1950
2  2  25.0  3  2300
3  1  22.5  1  1600
4  3  26.7  2  2600
5  3  25.8  3  2000
5  3  26.2  0  1300
3  3  28.7  3  3150
3  1  26.8  5  2700
5  3  27.5  0  2600
3  3  24.9  0  2100
2  1  29.3  4  3200
2  3  25.8  0  2600
3  2  25.7  0  2000
3  1  25.7  8  2000
3  1  26.7  5  2700
5  3  23.7  0  1850
3  3  26.8  0  2650
3  3  27.5  6  3150
5  3  23.4  0  1900
3  3  27.9  6  2800
4  3  27.5  3  3100
2  1  26.1  5  2800
2  1  27.7  6  2500
3  1  30.0  5  3300
4  1  28.5  9  3250
4  3  28.9  4  2800
3  3  28.2  6  2600
3  3  25.0  4  2100
3  3  28.5  3  3000
3  1  30.3  3  3600
5  3  24.7  5  2100
3  3  27.7  5  2900
2  1  27.4  6  2700
3  3  22.9  4  1600
3  1  25.7  5  2000
3  3  28.3  15 3000
3  3  27.2  3  2700
4  3  26.2  3  2300
3  1  27.8  0  2750
5  3  25.5  0  2250
4  3  27.1  0  2550
4  3  24.5  5  2050
4  1  27.0  3  2450
3  3  26.0  5  2150
3  3  28.0  1  2800
3  3  30.0  8  3050
3  3  29.0  10 3200
3  3  26.2  0  2400
3  1  26.5  0  1300
3  3  26.2  3  2400
4  3  25.6  7  2800
4  3  23.0  1  1650
4  3  23.0  0  1800
3  3  25.4  6  2250
4  3  24.2  0  1900
3  2  22.9  0  1600
4  2  26.0  3  2200
3  3  25.4  4  2250
4  3  25.7  0  1200
3  3  25.1  5  2100
4  2  24.5  0  2250
5  3  27.5  0  2900
4  3  23.1  0  1650
4  1  25.9  4  2550
3  3  25.8  0  2300
5  3  27.0  3  2250
3  3  28.5  0  3050
5  1  25.5  0  2750
5  3  23.5  0  1900
3  2  24.0  0  1700
3  1  29.7  5  3850
3  1  26.8  0  2550
5  3  26.7  0  2450
3  1  28.7  0  3200
4  3  23.1  0  1550
3  1  29.0  1  2800
4  3  25.5  0  2250
4  3  26.5  1  1967
4  3  24.5  1  2200
4  3  28.5  1  3000
3  3  28.2  1  2867
3  3  24.5  1  1600
3  3  27.5  1  2550
3  2  24.7  4  2550
3  1  25.2  1  2000
4  3  27.3  1  2900
3  3  26.3  1  2400
3  3  29.0  1  3100
3  3  25.3  2  1900
3  3  26.5  4  2300
3  3  27.8  3  3250
3  3  27.0  6  2500
4  3  25.7  0  2100
3  3  25.0  2  2100
3  3  31.9  2  3325
5  3  23.7  0  1800
5  3  29.3  12 3225
4  3  22.0  0  1400
3  3  25.0  5  2400
4  3  27.0  6  2500
4  3  23.8  6  1800
2  1  30.2  2  3275
4  3  26.2  0  2225
3  3  24.2  2  1650
3  3  27.4  3  2900
3  2  25.4  0  2300
4  3  28.4  3  3200
5  3  22.5  4  1475
3  3  26.2  2  2025
3  1  24.9  6  2300
2  2  24.5  6  1950
3  3  25.1  0  1800
3  1  28.0  4  2900
5  3  25.8  10 2250
3  3  27.9  7  3050
3  3  24.9  0  2200
3  1  28.4  5  3100
4  3  27.2  5  2400
3  2  25.0  6  2250
3  3  27.5  6  2625
3  1  33.5  7  5200
3  3  30.5  3  3325
4  3  29.0  3  2925
3  1  24.3  0  2000
3  3  25.8  0  2400
5  3  25.0  8  2100
3  1  31.7  4  3725
3  3  29.5  4  3025
4  3  24.0  10 1900
3  3  30.0  9  3000
3  3  27.6  4  2850
3  3  26.2  0  2300
3  1  23.1  0  2000
3  1  22.9  0  1600
5  3  24.5  0  1900
3  3  24.7  4  1950
3  3  28.3  0  3200
3  3  23.9  2  1850
4  3  23.8  0  1800
4  2  29.8  4  3500
3  3  26.5  4  2350
3  3  26.0  3  2275
3  3  28.2  8  3050
5  3  25.7  0  2150
3  3  26.5  7  2750
3  3  25.8  0  2200
4  3  24.1  0  1800
4  3  26.2  2  2175
4  3  26.1  3  2750
4  3  29.0  4  3275
2  1  28.0  0  2625
5  3  27.0  0  2625
3  2  24.5  0  2000
;
run;
Fig. 4.4, p. 85.
data crab1;
  set crab;
  wcat=0;
  if width<=23.25 then wcat=1;
  if 23.25< width<=24.25 then wcat=2;
  if 24.25< width<=25.25 then wcat=3;
  if 25.25< width<=26.25 then wcat=4;
  if 26.25< width<=27.25 then wcat=5;
  if 27.25< width<=28.25 then wcat=6;
  if 28.25< width<=29.25 then wcat=7;
  if 29.25< width then wcat=8;
run;  
proc sql; 
  create table midpt as
  select *, mean(satell) as smean, mean(width) as wmean, std(satell)*std(satell) as varsatell,
            count(wcat) as number
  from crab1
  group by wcat;
quit;
proc sort data=midpt;
  by wcat;
run;
data midpt;
  set midpt;
  by wcat;
  if first.wcat;
run; 
goptions reset=all; 
symbol v=dot c=blue h=.7;
axis1 order=(0 to 6 by 1);
proc gplot data=midpt;
  plot smean*wmean/ vaxis=axis1;
run;
quit;
goptions reset=all;
The Poisson Regression for the Horseshoe Crab data, p. 81. Obtaining the fitted value at the mean width of x=26.3 by outputting the predicted values from the genmod procedure, p. 84.
proc genmod data=crab; 
  model satell = width / dist=poi link=log;
  output out=temp1 p=pred1;
run;
proc print data = temp1;
  where width = 26.3;
  var width pred1;
run;
The GENMOD Procedure

       Model Information
Data Set              WORK.CRAB
Distribution            Poisson
Link Function               Log
Dependent Variable       satell
Observations Used           173

           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF
Deviance                 171        567.8786          3.3209
Scaled Deviance          171        567.8786          3.3209
Pearson Chi-Square       171        544.1570          3.1822
Scaled Pearson X2        171        544.1570          3.1822
Log Likelihood                       68.4463

Algorithm converged.
                            Analysis Of Parameter Estimates

                               Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
Intercept     1     -3.3048      0.5422     -4.3675     -2.2420      37.14        <.0001
width         1      0.1640      0.0200      0.1249      0.2032      67.51        <.0001
Scale         0      1.0000      0.0000      1.0000      1.0000
NOTE: The scale parameter was held fixed.
Obs    width     pred1

107     26.3    2.74458
Fig. 4.3, p. 84.
goptions reset=all;
symbol v=dot c=blue h=.8;
axis1 label=(angle=90 'Number of Satellites') order=(0 to 15 by 1);
axis2 order=(20 to 34 by 2);  
proc gplot data=crab;
 plot satell*width/vaxis=axis1 haxis=axis2;
run;
quit;
Fig. 4.4, p. 85 was not reproduced.  *Or should it be produced as it was above???

The Poisson regression using the identity link function, p. 85.
proc genmod data=crab; 
  model satell = width / dist=poi link=identity;  
  output out=temp2 p=pred2;
run;
proc print data = temp2;
  where width = 26.3 or width=27.3;
  var width satell pred2;
run;
The GENMOD Procedure

       Model Information
Data Set              WORK.CRAB
Distribution            Poisson
Link Function          Identity
Dependent Variable       satell
Observations Used           173
           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF
Deviance                 171        557.7083          3.2615
Scaled Deviance          171        557.7083          3.2615
Pearson Chi-Square       171        542.4854          3.1724
Scaled Pearson X2        171        542.4854          3.1724
Log Likelihood                       73.5314

Algorithm converged.
                            Analysis Of Parameter Estimates

                               Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
Intercept     1    -11.5321      1.5104    -14.4924     -8.5717      58.29        <.0001
width         1      0.5495      0.0593      0.4333      0.6657      85.89        <.0001
Scale         0      1.0000      0.0000      1.0000      1.0000
NOTE: The scale parameter was held fixed.
Obs    width    satell     pred2

106     27.3       1      3.46921
107     26.3       1      2.91971
Fig. 4.5, p. 86.
data temp1;
  set temp1;
  id = _n_;
run;
data temp2;
  set temp2;
  id = _n_;
run;
data combo;
  merge temp1 temp2;
  by id;
run;
proc sort data = combo;
  by width;
run;
goptions reset=all;
symbol1 v=dot c=blue h=.6 i=join;
symbol2 v=dot c=red h=.6 i=join;
legend1 label=none value=(height=1 font=swiss 'Log Link' 'Identity Link' ) 
        position=(bottom right inside) mode=share cborder=black;
proc gplot data = combo;
  where width=24 or width=25 or width=26 or width=27 or width=28 or width=29 or 
        width=30 or width=31 ;
  plot (pred1 pred2)*width/ overlay legend=legend1;
run;
quit;
goptions reset=all;
Table 4.3, p. 90.

Note: A new variable observation was added to the dataset in order to be able to merge it with the output from the genmod procedure.

data midpt1;
 input observation width cases satell;
 log_case = log(cases);
cards;
1 22.69  14  14
2 23.84  14  20
3 24.77  28  67
4 25.84  39 105
5 26.79  22  63
6 27.74  24  93
7 28.67  18  71
8 30.41  14  72
;
run;
proc genmod data = midpt1; 
 model satell = width / dist=poi link=log offset=log_case obstats ;
 ods output ObStats=temp; 
run;
data combo;
  merge temp midpt1;
  by observation;
run;
proc format;
  value width 22.69='<=23.25' 23.84='23.25-24.25' 24.77='24.25-25.25' 25.84='25.25-26.25'
             26.79='26.25-27.25' 27.74='27.25-28.25' 28.67='28.25-29.25' 30.41='>29.25';
run;
proc print data=combo;
  format width width.;
  var width cases pred reschi streschi;
run;
The GENMOD Procedure

        Model Information
Data Set              WORK.MIDPT1
Distribution              Poisson
Link Function                 Log
Dependent Variable         satell
Offset Variable          log_case
Observations Used               8
           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF
Deviance                   6          6.5168          1.0861
Scaled Deviance            6          6.5168          1.0861
Pearson Chi-Square         6          6.2470          1.0412
Scaled Pearson X2          6          6.2470          1.0412
Log Likelihood                     1652.1028

Algorithm converged.
                            Analysis Of Parameter Estimates

                               Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
Intercept     1     -3.5355      0.5760     -4.6645     -2.4065      37.67        <.0001
width         1      0.1727      0.0212      0.1311      0.2143      66.18        <.0001
Scale         0      1.0000      0.0000      1.0000      1.0000
NOTE: The scale parameter was held fixed.
                                  Observation Statistics

Observation       satell     log_case        width         Pred        Xbeta          Std
                              HessWgt        Lower        Upper       Resraw       Reschi
                               Resdev     StResdev     StReschi       Reslik
          1           14    2.6390573        22.69    20.544353    3.0225861    0.1027001
                            20.544353    16.798637     25.12528    -6.544353    -1.443845
                            -1.532938    -1.732037    -1.631372    -1.710727
          2           20    2.6390573        23.84    25.058503    3.2212132    0.0813848
                            25.058503    21.363886    29.392059    -5.058503    -1.010519
                            -1.047745     -1.14727    -1.106509    -1.140606
          3           67    3.3322045        24.77    58.849849    4.0749893    0.0657446
                            58.849849    51.734881    66.943322    8.1501507     1.062412
                             1.039205    1.2034817    1.2303572    1.2103746
                             
The GENMOD Procedure

                                  Observation Statistics

Observation       satell     log_case        width         Pred        Xbeta          Std
                              HessWgt        Lower        Upper       Resraw       Reschi
                               Resdev     StResdev     StReschi       Reslik
          4          105    3.6635616        25.84    98.608352     4.591156    0.0513763
                            98.608352    89.162468    109.05493    6.3916476    0.6436592
                             0.636887     0.740506      0.74838    0.7425635
          5           63    3.0910425        26.79    65.543892      4.18272    0.0448389
                            65.543892    60.029581    71.564747    -2.543892    -0.314219
                            -0.316285     -0.33944    -0.337223    -0.339149
          6           93    3.1780538        27.74    84.252198    4.4338147    0.0468532
                            84.252198    76.859888    92.355494    8.7478019    0.9530338
                            0.9372168    1.0381222    1.0556422    1.0413848
          7           71    2.8903718        28.67      74.1998    4.3067615    0.0562513
                              74.1998    66.454059    82.848368      -3.1998    -0.371468
                            -0.374187    -0.427757    -0.424648    -0.427029
          8           72    2.6390573        30.41    77.943052    4.3559785    0.0840923
                            77.943052    66.099465    91.908752    -5.943052    -0.673164
                            -0.682002    -1.017999    -1.004806    -1.010749
                            
Obs    width          cases         Pred       Reschi     Streschi

 1     <=23.25          14     20.544353    -1.443845    -1.631372
 2     23.25-24.25      14     25.058503    -1.010519    -1.106509
 3     24.25-25.25      28     58.849849     1.062412    1.2303572
 4     25.25-26.25      39     98.608352    0.6436592      0.74838
 5     26.25-27.25      22     65.543892    -0.314219    -0.337223
 6     27.25-28.25      24     84.252198    0.9530338    1.0556422
 7     28.25-29.25      18       74.1998    -0.371468    -0.424648
 8     >29.25           14     77.943052    -0.673164    -1.004806
Table 4.4, p. 92. Using the dataset that was created for Fig. 4.4.
proc format;
  value wcat 1='<=23.25' 2='23.25-24.25' 3='24.25-25.25' 4='25.25-26.25'
             5='26.25-27.25' 6='27.25-28.25' 7='28.25-29.25' 8='>29.25';
run;
proc print data=midpt;
  format wcat wcat.;
  var  wcat number  smean varsatell;
run;
Obs    wcat           number     smean     varsatell

 1     <=23.25          14      1.00000      2.7692
 2     23.25-24.25      14      1.42857      8.8791
 3     24.25-25.25      28      2.39286      6.5437
 4     25.25-26.25      39      2.69231     11.3765
 5     26.25-27.25      22      2.86364      6.8853
 6     27.25-28.25      24      3.87500      8.8098
 7     28.25-29.25      18      3.94444     16.8791
 8     >29.25           14      5.14286      8.2857

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California