UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Textbook Examples
An Introduction to Generalized Linear Models by Annette J. Dobson
Chapter 7:  Binary Variables and Logistic Regression

Table 7.2 and Figure 7.3 on page 119.
data table7_2;
  input dose number killed;
cards;
1.6907	59	6
1.7242	60	13
1.7552	62	18
1.7842	56	28
1.8113	63	52
1.8369	59	53
1.861	62	61
1.8839	60	60
;
run;
data table7_2a;
  set table7_2;
  proportion = killed/number;
  one = 1;
run;
goptions reset = all;
axis1 order = (1.6 to 1.9 by .1) minor = none;
axis2 order = (0 to 1 by .2) minor = none label=('Proportion killed');
symbol v = dot h = 1;
proc gplot data = table7_2a;
  plot proportion*dose /haxis=axis1 vaxis=axis2;
run;
quit;
Table 7.3 on page 121 based on the algorithm described in the text.
proc iml;
  use table7_2a;
  read all;
  x = one || dose;
  b0 = j(2, 1, 0);
  y = killed;
  nobs = nrow(y);
  n = number;
  fitted = j(nobs, 6, 0); /*for fitted values*/
  betas = j(3, 6); /*for parameter estimates*/
  j = i(2);  /*intermediate information matrix*/
  u = j(2, 1, 0); /*intermediate vector u*/
  count = 1; 
  do while(count<=6); 
  pi = exp(x*b0)/(1+exp(x*b0));
  ll = t(y)*(x*b0) - t(n)*log(one + exp(x*b0)) ;
  j[1,1] = sum(n#pi#(1-pi));
  j[1,2] = sum(n#dose#pi#(1-pi));
  j[2,1] = j[1,2];
  j[2,2] = sum(n#dose#dose#pi#(1-pi));
  u[1,] = sum(y - n#pi);
  u[2,] = sum(dose#(y-n#pi));
  b = b0 + inv(j)*u;
  betas[1, count] = b0[1];
  betas[2, count] = b0[2];
  betas[3, count] = ll;
  b0 = b;
  fitted[, count] = n#pi; 
   count = count + 1;
end;
rname = {'beta1' 'beta2' 'll'};
cname = {'initial' 'first' 'second' 'third' 'fourth' 'fifth'};
print 'Table 7.3 Fitting a linear logistic model';
print '-----------------------------------------------------------';
print betas[rowname=rname colname=cname label='' format = 8.4] ;
print '-----------------------------------------------------------';
print 'Observations                     Fitted values';
rn = {y1 y2 y3 y4 y5 y6 y7 y8};
print y[rowname=rn label=''] fitted[label='' format=10.3];
jinv = inv(j);
D = 2*sum(y#log(y/fitted[,6])+(y=0)) + 2*sum((n-y)#log((n-y)/(n-fitted[,6]) + (n-y=0))) ;
print jinv[format=6.3 label='V(b)'] D[format=5.3 label='Deviance'];
quit;
Table 7.3 Fitting a linear logistic model

-----------------------------------------------------------

       initial    first   second    third   fourth    fifth
beta1   0.0000 -37.8564 -53.8532 -59.9652 -60.7078 -60.7175
beta2   0.0000  21.3374  30.3835  33.8442  34.2648  34.2703
ll    -333.404 -200.010 -187.274 -186.247 -186.235 -186.235

-----------------------------------------------------------

Observations                     Fitted values

Y1         6     29.500      8.505      4.543      3.562      3.459      3.457
Y2        13     30.000     15.366     11.254      9.987      9.844      9.842
Y3        18     31.000     24.808     23.058     22.513     22.452     22.451
Y4        28     28.000     30.983     32.947     33.790     33.896     33.898
Y5        52     31.500     43.362     48.197     49.893     50.093     50.096
Y6        53     29.500     46.741     51.705     53.132     53.289     53.291
Y7        61     31.000     53.595     58.061     59.112     59.221     59.222
Y8        60     30.000     54.734     58.036     58.679     58.742     58.743

  V(b)        Deviance
26.840 -15.08    11.23
-15.08  8.481
Table 7.4 on page 122. SAS proc probit allows different distributions, such as probit and extreme value distribution. We use both proc logistic and proc probit for the three models in the table. Then we output the predicted probabilities from each model. In a proc sql step, we created expected count for each observation based on each of the three models. In the proc logistic, we used the option scale = none aggregate  to the deviance and in proc probit we used the option lackfit to get the deviance.
proc logistic data = table7_2;
  model killed/number = dose /scale=none aggregate;
  output out = logit p = p_logit;
run;
 Deviance and Pearson Goodness-of-Fit Statistics
Criterion        DF          Value     Value/DF     Pr > ChiSq
Deviance          6        11.2322       1.8720         0.0815
Pearson           6        10.0253       1.6709         0.1236
proc probit data = table7_2 ;
  model killed/number = dose /lackfit;
  output out = probit p=p_probit;
run;
                    Goodness-of-Fit Tests
Statistic                         Value       DF    Pr > ChiSq
Pearson Chi-Square               9.5134        6        0.1467
L.R.    Chi-Square              10.1198        6        0.1197
proc probit data = table7_2 ;
  model killed/number = dose /d = extreme lackfit;
  output out = extreme p = p_extreme;
run;
                    Goodness-of-Fit Tests
Statistic                         Value       DF    Pr > ChiSq
Pearson Chi-Square               3.2947        6        0.7711
L.R.    Chi-Square               3.4464        6        0.7511
proc sql;
  create table table7_4 as
  select logit.killed as killed, logit.number as number,
         logit.dose as dose, logit.p_logit*logit.number as n_logit,
		 probit.p_probit*probit.number as n_probit,
		 extreme.p_extreme*extreme.number as n_extreme
  from logit, probit, extreme
  where logit.killed = probit.killed and logit.number = probit.number and 
	probit.killed = extreme.killed and probit.number = extreme.number;
quit;
proc print data = table7_4;
  var killed number dose n_logit n_probit n_extreme;
run;
Obs    killed    number     dose     n_logit    n_probit    n_extreme
 1        6        59      1.6907     3.4583      3.3578      5.5894
 2       13        60      1.7242     9.8428     10.7216     11.2807
 3       18        62      1.7552    22.4518     23.4819     20.9542
 4       28        56      1.7842    33.8967     33.8155     30.3694
 5       52        63      1.8113    50.0942     49.6156     47.7764
 6       53        59      1.8369    53.2896     53.3189     54.1427
 7       61        62      1.8610    59.2213     59.6647     61.1133
 8       60        60      1.8839    58.7425     59.2280     59.9472
Table 7.5 on page 123.
data table7_5;
  input y n storage centrifuge;
cards;
55	102	1	40
52	99	1	150
57	108	1	350
55	76	2	40
50	81	2	150
50      90      2       350
;
run;
Figure 7.4 on page 124.
data table7_5a;
  set table7_5;
  proportion = y/n;
  logc = log(centrifuge);
run;
goptions reset =all;
axis1 order = (.5 to .8 by .1) minor = none label=('proportion');
axis2 label = ('Log(centrifuging force)') minor = none;
symbol1 v = P font=marker c = blue ;
symbol2 v = dot c=black;

proc gplot data = table7_5a;
  plot proportion*logc = storage /vaxis = axis1 haxis = axis2;
run;
quit;

Table 7.6 and 7.7 on page 125.
Model 1:
proc logistic data = table7_5a;
  class storage(ref='1') /param = ref;
  model y/n = logc storage logc*storage /scale=none aggregate;
  output out = m1 p = pm1;
run;
       Deviance and Pearson Goodness-of-Fit Statistics
Criterion        DF          Value     Value/DF     Pr > ChiSq
Deviance          2         0.0277       0.0139         0.9862
Pearson           2         0.0277       0.0139         0.9862

                Analysis of Maximum Likelihood Estimates
                                    Standard          Wald
Parameter         DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept          1      0.2339      0.6284        0.1385        0.7097
logc               1     -0.0227      0.1268        0.0321        0.8577
storage      2     1      1.9759      0.9980        3.9201        0.0477
logc*storage 2     1     -0.3184      0.1989        2.5635        0.1094
Model 2:
proc logistic data = table7_5a;
  class storage(ref='1') /param = ref;
  model y/n = logc storage  /scale=none aggregate;
  output out = m2 p = pm2;
run;
  Deviance and Pearson Goodness-of-Fit Statistics
Criterion        DF          Value     Value/DF     Pr > ChiSq
Deviance          3         2.6188       0.8729         0.4542
Pearson           3         2.5980       0.8660         0.4578
              Analysis of Maximum Likelihood Estimates<
                                 Standard          Wald
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept       1      0.8767      0.4870        3.2406        0.0718
logc            1     -0.1546      0.0970        2.5386        0.1111
storage   2     1      0.4068      0.1746        5.4278        0.0198
Model 3:
proc logistic data = table7_5a;
  model y/n = logc  /scale=none aggregate;
  output out = m3 p = pm3;
run;
       Deviance and Pearson Goodness-of-Fit Statistics
Criterion        DF          Value     Value/DF     Pr > ChiSq
Deviance          1         0.0099       0.0099         0.9207
Pearson           1         0.0099       0.0099         0.9206

             Analysis of Maximum Likelihood Estimates
                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept     1      1.0213      0.4813        4.5025        0.0338
logc          1     -0.1478      0.0965        2.3474        0.1255
proc sql;
  create table table7_7 as
  select m1.*, m1.pm1*m1.n as n_m1,
		       m2.pm2*m2.n as n_m2,
		       m3.pm3*m3.n as n_m3
  from m1, m2, m3
  where m1.y = m2.y and m1.n = m2.n and 
		m1.storage = m2.storage and m1.logc = m2.logc and
        m1.y = m3.y and m1.n = m3.n and m1.logc = m3.logc;
quit;
proc print data = table7_7;
  var storage y n_m1 n_m2 n_m3;
run;
Obs    storage     y      n_m1       n_m2       n_m3
 1        1       55    54.8182    58.7539    62.9125
 2        1       52    52.4654    52.0253    56.3979
 3        1       57    56.7164    53.2208    58.1836
 4        2       55    54.8257    51.0056    46.8759
 5        2       50    50.4279    50.5895    46.1437
 6        2       50    49.7389    53.4045    48.4863
Table 7.8 and logistic regression result on page 129.
data table7_8;
  input x	s @@;
datalines;
9	1 13	1   6	1   8	1
10	1  4	1  14	1   8	1
11	1  7	1   9	1   7	1
5	1  14	1  13	0  16	0
10	0  12	0  11	0  14	0
15	0  18	0   7	0  16	0
9	0   9	0  11	0  13	0
15	0  13	0  10	0  11	0
6	0  17	0  14	0  19	0
9	0  11	0  14	0  10	0
16	0  10	0  16	0  14	0
13	0  13	0   9	0  15	0
10	0  11	0  12	0   4	0
14	0  20	0
;
run;

proc logistic data = table7_8;
  model s(event='1') = x /scale = none aggregate rsq;
run;
       Deviance and Pearson Goodness-of-Fit Statistics
Criterion        DF          Value     Value/DF     Pr > ChiSq
Deviance         15         9.4190       0.6279         0.8546
Pearson          15         8.0830       0.5389         0.9204
Number of unique profiles: 17

         Model Fit Statistics
                              Intercept
               Intercept         and
Criterion        Only        Covariates
AIC               63.806         55.017
SC                65.795         58.995
-2 Log L          61.806         51.017

R-Square    0.1811    Max-rescaled R-Square    0.2657

        Testing Global Null Hypothesis: BETA=0
Test                 Chi-Square       DF     Pr > ChiSq
Likelihood Ratio        10.7889        1         0.0010
Score                    9.7954        1         0.0017
Wald                     8.0570        1         0.0045

             Analysis of Maximum Likelihood Estimates
                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept     1      2.4040      1.1918        4.0687        0.0437
x             1     -0.3235      0.1140        8.0570        0.0045
Figure 7.5 on page 130. We first created the grouped data set based on the covariate patterns of variable x. Then we refit the data to the logistic model and created predicted probabilities.
proc sql;
  create table grp7_8 as
  select distinct x as x, sum(s) as y, count(s) as total, sum(s)/count(s) as proportion
  from table7_8
  group by x;
quit;
proc logistic data = grp7_8 noprint;
  model y/total = x /scale = none aggregate rsq;
  output out = m p=p;
run;
goptions reset = all;
symbol1 v = dot c=black;
symbol2 v = P font=marker c=blue;
axis1 order = (0 to 1 by .5) minor = none label=('Proportion');
axis2 order = (0 to 20 by 5) minor = none label=('WAIS score');
proc gplot data = m;
  plot p*x proportion*x /overlay vaxis = axis1 haxis=axis2;
run;
quit;

Table 7.9 on page 130.
proc logistic data = grp7_8 noprint;
  model y/total = x ;
  output out = m1 p=p reschi = reschi resdev = d ;
run;
proc print data = m1;
 var x y total p reschi d;
run;
Obs     x    y    total       p        reschi         d
  1     4    1      2      0.75211    -0.82574    -0.76598
  2     5    1      1      0.68706     0.67490     0.86642
  3     6    1      2      0.61369    -0.33022    -0.32585
  4     7    2      3      0.53478     0.45799     0.46370
  5     8    2      2      0.45408     1.55065     1.77706
  6     9    2      6      0.37573    -0.21441    -0.21619
  7    10    1      6      0.30338    -0.72844    -0.77068
  8    11    1      6      0.23962    -0.41862    -0.43591
  9    12    0      2      0.18568    -0.67531    -0.90643
 10    13    1      6      0.14163     0.17592     0.17190
 11    14    2      7      0.10665     1.53479     1.30564
 12    15    0      3      0.07952    -0.50908    -0.70509
 13    16    0      4      0.05883    -0.50004    -0.69647
 14    17    0      1      0.04327    -0.21268    -0.29745
 15    18    0      1      0.03169    -0.18091    -0.25379
 16    19    0      1      0.02313    -0.15389    -0.21636
 17    20    0      1      0.01685    -0.13091    -0.18434

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California