UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Textbook Examples
Applied Logistic Regression, Second Edition by Hosmer and Lemeshow
Chapter 4: Model-Building strategies and methods for logistic regression

4.2 Variable selection

page 105 Table 4.1 Simple logistic regression models for the UIS (n = 575).

NOTE: We have bolded the relevant output.
data uis41;
  set 'd:\hosmerdata\uis';
run;
proc genmod data=uis41 descending;
  model dfree = age / dist=bin link=logit waldci;
  estimate '10 year increase in age' age 10 /exp ;
run;

The GENMOD Procedure

          Model Information

Data Set                    WORK.UIS41
Distribution                  Binomial
Link Function                    Logit
Dependent Variable               DFREE
Observations Used                  575
Probability Modeled    Pr( DFREE = 1 )

      Response Profile

Ordered    Ordered
  Level    Value        Count

      1    0              428
      2    1              147

  Parameter Information

Parameter       Effect

Prm1            Intercept
Prm2            AGE

           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF

Deviance                 573        652.3309          1.1384
Scaled Deviance          573        652.3309          1.1384
Pearson Chi-Square       573        575.1709          1.0038
Scaled Pearson X2        573        575.1709          1.0038
Log Likelihood                     -326.1654


Algorithm converged.

                            Analysis Of Parameter Estimates

                               Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq

Intercept     1     -1.6602      0.5111     -2.6619     -0.6585      10.55        0.0012
AGE           1      0.0182      0.0153     -0.0119      0.0482       1.40        0.2363
Scale         0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.
The GENMOD Procedure

                                   Contrast Estimate Results

                                        Standard                                Chi-
Label                         Estimate     Error   Alpha   Confidence Limits  Square  Pr > ChiSq

10 year increase in age         0.1817    0.1534    0.05   -0.1190    0.4825    1.40      0.2363
Exp(10 year increase in age)    1.1993    0.1840    0.05    0.8878    1.6201
 

proc genmod data=uis41 descending;
  model dfree = beck / dist=bin link=logit waldci;
  estimate '5 point increase in beck' beck 5 /exp ;
  run;

The GENMOD Procedure

          Model Information

Data Set                    WORK.UIS41
Distribution                  Binomial
Link Function                    Logit
Dependent Variable               DFREE
Observations Used                  575
Probability Modeled    Pr( DFREE = 1 )

       Response Profile

Ordered    Ordered
  Level    Value        Count

      1    0              428
      2    1              147

  Parameter Information

Parameter       Effect

Prm1            Intercept
Prm2            BECK

          Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF

Deviance                 573        653.0924          1.1398
Scaled Deviance          573        653.0924          1.1398
Pearson Chi-Square       573        575.1216          1.0037
Scaled Pearson X2        573        575.1216          1.0037
Log Likelihood                     -326.5462


Algorithm converged.

                            Analysis Of Parameter Estimates

                               Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq

Intercept     1     -0.9273      0.2003     -1.3199     -0.5347      21.43        <.0001
BECK          1     -0.0082      0.0103     -0.0285      0.0120       0.63        0.4265
Scale         0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.
The GENMOD Procedure

                                 Contrast Estimate Results

                                           Standard                                    Chi-
Label                           Estimate      Error    Alpha    Confidence Limits    Square

5 point increase in beck         -0.0411     0.0517     0.05    -0.1425     0.0602     0.63
Exp(5 point increase in beck)     0.9597     0.0496     0.05     0.8672     1.0621

        Contrast Estimate Results

Label                           Pr > ChiSq

5 point increase in beck            0.4265
Exp(5 point increase in beck)
 

proc logistic data=uis41 desc;
  model dfree = ndrugtx;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS41
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        645.890
SC               660.083        654.598
-2 Log L         653.729        641.890

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        11.8392        1         0.0006
Score                    9.7585        1         0.0018
Wald                     9.2203        1         0.0024

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -0.7678      0.1303       34.7133        <.0001
NDRUGTX       1     -0.0749      0.0247        9.2203        0.0024
The LOGISTIC Procedure

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

NDRUGTX       0.928       0.884       0.974

Association of Predicted Probabilities and Observed Responses

Percent Concordant     54.6    Somers' D    0.203
Percent Discordant     34.3    Gamma        0.228
Percent Tied           11.1    Tau-a        0.077
Pairs                 62916    c            0.602
 

proc logistic data=uis41 desc;
  model dfree = ivhx2 ivhx3;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS41
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                   Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        646.376
SC               660.083        659.440
-2 Log L         653.729        640.376

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        13.3525        2         0.0013
Score                   13.4161        2         0.0012
Wald                    13.1585        2         0.0014

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -0.6797      0.1417       22.9977        <.0001
IVHX2         1     -0.4810      0.2657        3.2773        0.0702
IVHX3         1     -0.7748      0.2166       12.7997        0.0003

           Odds Ratio Estimates

             Point          95% Wald
Effect    Estimate      Confidence Limits

IVHX2        0.618       0.367       1.041
IVHX3        0.461       0.301       0.704

Association of Predicted Probabilities and Observed Responses

Percent Concordant     41.5    Somers' D    0.185
Percent Discordant     23.0    Gamma        0.287
Percent Tied           35.5    Tau-a        0.071
Pairs                 62916    c            0.593
 

proc logistic data=uis41 desc;
  model dfree = race;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS41
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                   Model Convergence Status

        Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        653.105
SC               660.083        661.814
-2 Log L         653.729        649.105

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio         4.6235        1         0.0315
Score                    4.7791        1         0.0288
Wald                     4.7378        1         0.0295

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -1.1939      0.1142      109.3946        <.0001
RACE          1      0.4592      0.2110        4.7378        0.0295
The LOGISTIC Procedure

           Odds Ratio Estimates

             Point          95% Wald
Effect    Estimate      Confidence Limits

RACE         1.583       1.047       2.393

Association of Predicted Probabilities and Observed Responses

Percent Concordant     24.7    Somers' D    0.091
Percent Discordant     15.6    Gamma        0.226
Percent Tied           59.8    Tau-a        0.035
Pairs                 62916    c            0.545
 
proc logistic data=uis41 desc;
  model dfree = treat;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS41
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                   Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        652.551
SC               660.083        661.259
-2 Log L         653.729        648.551

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio         5.1782        1         0.0229
Score                    5.1626        1         0.0231
Wald                     5.1266        1         0.0236

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -1.2978      0.1433       82.0211        <.0001
TREAT         1      0.4371      0.1931        5.1266        0.0236
The LOGISTIC Procedure

           Odds Ratio Estimates

             Point          95% Wald
Effect    Estimate      Confidence Limits

TREAT        1.548       1.060       2.260

Association of Predicted Probabilities and Observed Responses

Percent Concordant     30.7    Somers' D    0.109
Percent Discordant     19.8    Gamma        0.215
Percent Tied           49.5    Tau-a        0.041
Pairs                 62916    c            0.554
 

proc logistic data=uis41 desc;
  model dfree = site;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS41
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

         Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        656.063
SC               660.083        664.772
-2 Log L         653.729        652.063

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio         1.6659        1         0.1968
Score                    1.6921        1         0.1933
Wald                     1.6874        1         0.1939

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -1.1527      0.1171       96.9397        <.0001
SITE          1      0.2642      0.2034        1.6874        0.1939
The LOGISTIC Procedure

           Odds Ratio Estimates

             Point          95% Wald
Effect    Estimate      Confidence Limits

SITE         1.302       0.874       1.940

Association of Predicted Probabilities and Observed Responses

Percent Concordant     24.6    Somers' D    0.057
Percent Discordant     18.9    Gamma        0.131
Percent Tied           56.4    Tau-a        0.022
Pairs                 62916    c            0.529

page 106 Table 4.2 Results of fitting a multivariable model containing the covariates significant at the 0.25 level in Table 4.1.
proc logistic data=uis41 desc;
  model dfree = age ndrugtx ivhx2 ivhx3 race treat site / alpha=.25;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS41
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                   Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        635.248
SC               660.083        670.083
-2 Log L         653.729        619.248

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        34.4806        7         <.0001
Score                   32.6795        7         <.0001
Wald                    30.6395        7         <.0001

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates
                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -2.4054      0.5548       18.7975        <.0001
AGE           1      0.0504      0.0173        8.4550        0.0036
NDRUGTX       1     -0.0615      0.0256        5.7559        0.0164
IVHX2         1     -0.6033      0.2872        4.4118        0.0357
IVHX3         1     -0.7327      0.2523        8.4328        0.0037
RACE          1      0.2261      0.2233        1.0251        0.3113
TREAT         1      0.4425      0.1993        4.9302        0.0264
SITE          1      0.1486      0.2172        0.4681        0.4939

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

AGE           1.052       1.017       1.088
NDRUGTX       0.940       0.894       0.989
IVHX2         0.547       0.312       0.960
IVHX3         0.481       0.293       0.788
RACE          1.254       0.809       1.942
TREAT         1.557       1.053       2.300
SITE          1.160       0.758       1.776

Association of Predicted Probabilities and Observed Responses

Percent Concordant     66.6    Somers' D    0.336
Percent Discordant     33.0    Gamma        0.337
Percent Tied            0.4    Tau-a        0.128
Pairs                 62916    c            0.668

page 107 Figure 4.2 Univariable lowess smoothed logit versus AGE.

The smoothing algorithm below is based on Stata's lowess program with logit option. The discrepancy between the two plots by Stata and SAS is due to the difference between the algorithms used by Stata and SAS for Loess smoothing.

proc loess data = uis;
  model dfree = age /smooth=.6;
  ods output OutputStatistics=a;
run;
proc sql; /*compute the total number of obs*/
  select count(dfree) into :total
  from uis;
  quit;
data b1;
  set a;
  adjust = 1/&total;
  small = .0001;
  if pred < small then pred = adjust;
  else if pred > 1 - small then pred = 1 - adjust;
  pred = log(pred/(1-pred));
run;
proc sort data = b1;
  by age;
run;
goptions reset = all;
symbol i = join v=star;
axis1 order = (20 to 56 by 9) minor=none;
axis2 order = (-1.5 to .5 by .5) minor = none label=(a=90 'Smoothed Logit');
proc gplot data = b1;
  format age 3.0 pred 5.1;
  plot pred*age /vaxis=axis2 haxis=axis1 ;
run;
quit;


page 107 Table 4.3 Results of the quartile analyses of AGE from the multivariable model containing the variable shown in the model in Table 4.2.

data table4_3;
input quartile midpt number age coeff;
cards;
1 24 148 24 0
2 30.5 144 30.5 -.165864
3 35.5 166 35.5 .4693399
4 47.5 117 47.5 .595771
;
run;
proc print data=table4_3;
run;

Obs    quartile    midpt    number     age      coeff

 1         1        24.0      148     24.0     0.00000
 2         2        30.5      144     30.5    -0.16586
 3         3        35.5      166     35.5     0.46934
 4         4        47.5      117     47.5     0.59577


proc sort data=uis41;
  by age;
run;
data uis41a;
  set uis41;
  age1 = (_n_ <= 148);
  age2 = (_n_ >= 149) & (_n_ <= 292);
  age3 = (_n_ >= 293) & (_n_ <= 458) ;
  age4 = (_n_ >= 459) ;
run;
proc logistic data=uis41a desc;
  model dfree = age2 age3 age4 ndrugtx ivhx2 ivhx3  race treat site / CLPARM=both;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS41A
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

         Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        639.042
SC               660.083        682.586
-2 Log L         653.729        619.042

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        34.6869        9         <.0001
Score                   32.7145        9         0.0001
Wald                    30.6492        9         0.0003

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -1.0549      0.2706       15.1988        <.0001
age2          1     -0.1659      0.2909        0.3250        0.5686
age3          1      0.4693      0.2707        3.0067        0.0829
age4          1      0.5957      0.3125        3.6344        0.0566
NDRUGTX       1     -0.0587      0.0255        5.3185        0.0211
IVHX2         1     -0.5545      0.2854        3.7764        0.0520
IVHX3         1     -0.6726      0.2519        7.1312        0.0076
RACE          1      0.2787      0.2238        1.5502        0.2131
TREAT         1      0.4431      0.2000        4.9054        0.0268
SITE          1      0.1582      0.2188        0.5228        0.4696

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

age2          0.847       0.479       1.498
age3          1.599       0.941       2.718
age4          1.814       0.983       3.348
NDRUGTX       0.943       0.897       0.991
IVHX2         0.574       0.328       1.005
IVHX3         0.510       0.312       0.836
RACE          1.321       0.852       2.049
TREAT         1.557       1.052       2.305
SITE          1.171       0.763       1.799

Association of Predicted Probabilities and Observed Responses

Percent Concordant     66.2    Somers' D    0.330
Percent Discordant     33.2    Gamma        0.332
Percent Tied            0.7    Tau-a        0.126
Pairs                 62916    c            0.665

         Profile Likelihood Confidence
            Interval for Parameters

Parameter     Estimate     95% Confidence Limits

Intercept      -1.0549      -1.5955      -0.5327
age2           -0.1659      -0.7410       0.4027
age3            0.4693      -0.0577       1.0054
age4            0.5957      -0.0161       1.2118
NDRUGTX        -0.0587      -0.1122      -0.0121
IVHX2          -0.5545      -1.1266     -0.00495

The LOGISTIC Procedure

         Profile Likelihood Confidence
            Interval for Parameters

Parameter     Estimate     95% Confidence Limits

IVHX3          -0.6726      -1.1721      -0.1830
RACE            0.2787      -0.1647       0.7142
TREAT           0.4431       0.0528       0.8380
SITE            0.1582      -0.2747       0.5844

    Wald Confidence Interval for Parameters

Parameter     Estimate     95% Confidence Limits

Intercept      -1.0549      -1.5852      -0.5246
age2           -0.1659      -0.7360       0.4043
age3            0.4693      -0.0612       0.9998
age4            0.5957      -0.0167       1.2082
NDRUGTX        -0.0587      -0.1086     -0.00882
IVHX2          -0.5545      -1.1138      0.00476
IVHX3          -0.6726      -1.1662      -0.1789
RACE            0.2787      -0.1600       0.7174
TREAT           0.4431       0.0510       0.8351
SITE            0.1582      -0.2707       0.5871

page 108 Figure 4.3 Plot of estimated logistic regression coefficients versus approximate quartile midpoints of AGE.
symbol1 i=join ;
proc gplot data=table4_3;
  plot coeff*age / vref=0;
run;
quit;


page 109 Table 4.4 Summary of the use of the method of fractional polynomials for AGE.

NOTE: The values in the column titled deviance are under the heading -2 Log L intercepts and covariates in the SAS output.
data uistbl44;
  set uis41;
  agethree=age**3;
  age_2 = age**(-2);
run;
NOTE: Line 1: Not in model
proc logistic data=uistbl44 desc;
  model dfree = ndrugtx ivhx2 ivhx3 race treat site;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UISTBL44
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        641.801
SC               660.083        672.281
-2 Log L         653.729        627.801

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        25.9282        6         0.0002
Score                   24.7124        6         0.0004
Wald                    23.3984        6         0.0007

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -0.9462      0.2264       17.4734        <.0001
NDRUGTX       1     -0.0523      0.0246        4.5227        0.0334
IVHX2         1     -0.3853      0.2731        1.9903        0.1583
IVHX3         1     -0.4994      0.2354        4.4990        0.0339
RACE          1      0.2973      0.2205        1.8179        0.1776
TREAT         1      0.4117      0.1974        4.3494        0.0370
SITE          1      0.1784      0.2151        0.6883        0.4067

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

NDRUGTX       0.949       0.904       0.996
IVHX2         0.680       0.398       1.162
IVHX3         0.607       0.383       0.963
RACE          1.346       0.874       2.074
TREAT         1.509       1.025       2.222
SITE          1.195       0.784       1.822

Association of Predicted Probabilities and Observed Responses

Percent Concordant     63.9    Somers' D    0.288
Percent Discordant     35.0    Gamma        0.292
Percent Tied            1.1    Tau-a        0.110
Pairs                 62916    c            0.644

NOTE: Line 2: Linear
proc logistic data=uistbl44 desc;
  model dfree = age ndrugtx ivhx2 ivhx3 race treat site;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UISTBL44
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        635.248
SC               660.083        670.083
-2 Log L         653.729        619.248

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        34.4806        7         <.0001
Score                   32.6795        7         <.0001
Wald                    30.6395        7         <.0001

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -2.4054      0.5548       18.7975        <.0001
AGE           1      0.0504      0.0173        8.4550        0.0036
NDRUGTX       1     -0.0615      0.0256        5.7559        0.0164
IVHX2         1     -0.6033      0.2872        4.4118        0.0357
IVHX3         1     -0.7327      0.2523        8.4328        0.0037
RACE          1      0.2261      0.2233        1.0251        0.3113
TREAT         1      0.4425      0.1993        4.9302        0.0264
SITE          1      0.1486      0.2172        0.4681        0.4939

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

AGE           1.052       1.017       1.088
NDRUGTX       0.940       0.894       0.989
IVHX2         0.547       0.312       0.960
IVHX3         0.481       0.293       0.788
RACE          1.254       0.809       1.942
TREAT         1.557       1.053       2.300
SITE          1.160       0.758       1.776

Association of Predicted Probabilities and Observed Responses

Percent Concordant     66.6    Somers' D    0.336
Percent Discordant     33.0    Gamma        0.337
Percent Tied            0.4    Tau-a        0.128
Pairs                 62916    c            0.668

NOTE: Line 3: J = 1
proc logistic data=uistbl44 desc;
  model dfree = agethree ndrugtx ivhx2 ivhx3 race treat site;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UISTBL44
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                   Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        634.882
SC               660.083        669.717
-2 Log L         653.729        618.882

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        34.8466        7         <.0001
Score                   33.0920        7         <.0001
Wald                    30.8612        7         <.0001

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -1.3032      0.2583       25.4622        <.0001
agethree      1    0.000014    4.648E-6        8.9327        0.0028
NDRUGTX       1     -0.0620      0.0257        5.8134        0.0159
IVHX2         1     -0.5961      0.2869        4.3184        0.0377
IVHX3         1     -0.7142      0.2500        8.1632        0.0043
RACE          1      0.2355      0.2230        1.1152        0.2909
TREAT         1      0.4349      0.1992        4.7634        0.0291
SITE          1      0.1437      0.2174        0.4370        0.5086

            Odds Ratio Estimates

               Point          95% Wald
Effect      Estimate      Confidence Limits

agethree       1.000       1.000       1.000
NDRUGTX        0.940       0.894       0.988
IVHX2          0.551       0.314       0.967
IVHX3          0.490       0.300       0.799
RACE           1.266       0.817       1.959
TREAT          1.545       1.045       2.283
SITE           1.155       0.754       1.768

Association of Predicted Probabilities and Observed Responses

Percent Concordant     66.5    Somers' D    0.335
Percent Discordant     33.0    Gamma        0.337
Percent Tied            0.5    Tau-a        0.128
Pairs                 62916    c            0.667

G = 619.248 - 618.882 = .366
NOTE: Line 4: J = 2
proc logistic data=uistbl44 desc;
  model dfree = ndrugtx agethree age_2 ivhx2 ivhx3 race treat site;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UISTBL44
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        636.769
SC               660.083        675.958
-2 Log L         653.729        618.769

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        34.9602        8         <.0001
Score                   33.1864        8         <.0001
Wald                    31.0132        8         0.0001

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -1.0496      0.7957        1.7401        0.1871
NDRUGTX       1     -0.0620      0.0257        5.8171        0.0159
agethree      1    0.000012    8.098E-6        2.0724        0.1500
age_2         1      -153.9       457.6        0.1131        0.7367
IVHX2         1     -0.6058      0.2882        4.4192        0.0355
IVHX3         1     -0.7264      0.2526        8.2703        0.0040
RACE          1      0.2282      0.2241        1.0371        0.3085
TREAT         1      0.4393      0.1997        4.8384        0.0278
SITE          1      0.1459      0.2175        0.4502        0.5022

            Odds Ratio Estimates

               Point          95% Wald
Effect      Estimate      Confidence Limits

NDRUGTX        0.940       0.894       0.988
agethree       1.000       1.000       1.000
age_2         <0.001      <0.001    >999.999
IVHX2          0.546       0.310       0.960
IVHX3          0.484       0.295       0.793
RACE           1.256       0.810       1.949
TREAT          1.552       1.049       2.295
SITE           1.157       0.756       1.772

Association of Predicted Probabilities and Observed Responses

Percent Concordant     66.6    Somers' D    0.337
Percent Discordant     32.9    Gamma        0.339
Percent Tied            0.5    Tau-a        0.128
Pairs                 62916    c            0.668

G = 618.882 - 618.769 = .479

page 110 Figure 4.4 Univariable lowess smoothed logit versus number of previous drug treatments (NDRGTX).

The smoothing algorithm below is based on Stata's lowess program with logit option. The discrepancy between the two plots by Stata and SAS is due to the difference between the algorithms used by Stata and SAS for Loess smoothing.

proc loess data = uis;
  model dfree = ndrugtx /smooth=.5;
  ods output OutputStatistics=a;
run;
proc means data = a;
  var pred;
run;
proc sql; /*compute the total number of obs*/
  select count(dfree) into :total
  from uis;
  quit;
data b1;
  set a;
  adjust = 1/&total;
  small = .0001;
  if pred < small then pred = adjust;
  else if pred > 1 - small then pred = 1 - adjust;
  pred = log(pred/(1-pred));
run;
proc sort data = b1;
  by ndrugtx;
run;
goptions  ftext = swiss htitle = 5 htext = 3 gunit = pct
border cback = white hsize = 5in vsize = 4in;
filename outgraph 'd:\temp\alr2.gif';
goptions gsfname = outgraph dev = gif570;
symbol i = join v=star;
axis1 order = (0 to 40 by 5) minor=none;
axis2 order = (-2 to -.5 by .5) minor = none;
proc gplot data = b1;
  format ndrugtx 3.0 ;
  plot pred*ndrugtx /vaxis=axis2 haxis=axis1 ;
run;
quit;



page 110 Table 4.5 Results of the design variable analysis of number of previous drug treatments (NDRGTX) from the multivariable model containing the variables shown in the model in Table 4.2.

data uis42;
  set uis41;
  grp = .;
  if ndrugtx=0 then grp = 1;
  if ndrugtx=1 or ndrugtx=2 then grp = 2;
  if 3<=ndrugtx<16 then grp = 3;
  if ndrugtx>15 then grp = 4;
  if grp = 2 then grp2 = 1; else grp2 = 0;
  if grp = 3 then grp3 = 1; else grp3 = 0;
  if grp = 4 then grp4 = 1; else grp4 = 0;
run;
proc logistic data=uis42 desc;
  model dfree = age grp2 grp3 grp4 ivhx2 ivhx3 race treat site / CLPARM=both;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS42
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        638.638
SC               660.083        682.182
-2 Log L         653.729        618.638

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        35.0906        9         <.0001
Score                   34.5976        9         <.0001
Wald                    32.5146        9         0.0002

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -2.6601      0.6060       19.2711        <.0001
AGE           1      0.0506      0.0173        8.5540        0.0034
grp2          1           0.4060      0.3090        1.7262        0.1889
grp3          1         -0.1537      0.3117        0.2432        0.6219
grp4          1         -0.5852      0.6206        0.8894        0.3457
IVHX2         1     -0.6478      0.2898        4.9958        0.0254
IVHX3         1     -0.7955      0.2542        9.7909        0.0018
RACE          1      0.2412      0.2244        1.1551        0.2825
TREAT         1      0.4199      0.1997        4.4230        0.0355
SITE          1      0.1619      0.2206        0.5385        0.4630

           Odds Ratio Estimates

             Point          95% Wald
Effect    Estimate      Confidence Limits

AGE          1.052       1.017       1.088
grp2         1.501       0.819       2.750
grp3         0.858       0.466       1.580
grp4         0.557       0.165       1.880
IVHX2        0.523       0.296       0.923
IVHX3        0.451       0.274       0.743
RACE         1.273       0.820       1.976
TREAT        1.522       1.029       2.251
SITE         1.176       0.763       1.812

Association of Predicted Probabilities and Observed Responses

Percent Concordant     66.2    Somers' D    0.330
Percent Discordant     33.2    Gamma        0.332
Percent Tied            0.6    Tau-a        0.126
Pairs                 62916    c            0.665

         Profile Likelihood Confidence
            Interval for Parameters

Parameter     Estimate     95% Confidence Limits

Intercept      -2.6601      -3.8671      -1.4871
AGE             0.0506       0.0168       0.0848
grp2            0.4060      -0.1906       1.0244
grp3           -0.1537      -0.7559       0.4696
grp4           -0.5852      -1.9302       0.5550
IVHX2          -0.6478      -1.2289      -0.0898

The LOGISTIC Procedure

         Profile Likelihood Confidence
            Interval for Parameters

Parameter     Estimate     95% Confidence Limits

IVHX3          -0.7955      -1.2996      -0.3012
RACE            0.2412      -0.2037       0.6775
TREAT           0.4199       0.0302       0.8140
SITE            0.1619      -0.2745       0.5916

    Wald Confidence Interval for Parameters

Parameter     Estimate     95% Confidence Limits

Intercept      -2.6601      -3.8477      -1.4724
AGE             0.0506       0.0167       0.0845
grp2            0.4060      -0.1997       1.0117
grp3           -0.1537      -0.7646       0.4572
grp4           -0.5852      -1.8015       0.6311
IVHX2          -0.6478      -1.2158      -0.0797
IVHX3          -0.7955      -1.2938      -0.2972
RACE            0.2412      -0.1987       0.6810
TREAT           0.4199       0.0286       0.8113
SITE            0.1619      -0.2705       0.5943

data table4_4;
input group midpoint number coeff;
cards;
1 0 79 0
2 1.5 173 .406
3 9 294 -.154
4 28 29 -.585
;
run;
proc print data=table4_4;
run;

Obs    group    midpoint    number     coeff

 1       1         0.0         79      0.000
 2       2         1.5        173      0.406
 3       3         9.0        294     -0.154
 4       4        28.0         29     -0.585

page 111 Figure 4.5 Plot of estimated logistic regression coefficients from Table 4.4 versus the midpoints of number of previous drug treatment groups.
symbol1 i=join value=circle;
proc gplot data=table4_4;
  plot coeff*midpoint / vref=0;
run;
quit;

page 112 Figure 4.6 Plot of the univariable lowess smoothed logit (o) and the multivariable adjusted logit (+) from the J = 2 fractional polynomial model versus number of previous drug treatments (NDRGTX).

NOTE: We were unable to reproduce this graph.

page 113 Table 4.7 Results of fitting the multivariable model with the two term fractional polynomial transformation of NDRGTX.

NOTE: Everything regarding the constant in this output is different from what is shown in the book, and we don't know why.
data uis43;
  set uis41;
  ndrgfp1 = ((ndrugtx+1)/10)**(-1);
  ndrgfp2 = ndrgfp1*log((ndrugtx+1)/10);
run;
proc logistic data=uis43 desc;
  model dfree = age ndrgfp1 ndrgfp2 ivhx2 ivhx3 race treat site;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS43
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        631.451
SC               660.083        670.640
-2 Log L         653.729               613.451

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        40.2777        8         <.0001
Score                   38.7032        8         <.0001
Wald                    36.1456        8         <.0001

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -4.3137      0.7925       29.6321        <.0001
AGE           1      0.0544      0.0175        9.6928        0.0018
ndrgfp1       1      0.9814      0.2888       11.5446        0.0007
ndrgfp2       1      0.3611      0.1099       10.8050        0.0010
IVHX2         1     -0.6088      0.2911        4.3740        0.0365
IVHX3         1     -0.7238      0.2556        8.0213        0.0046
RACE          1      0.2477      0.2242        1.2205        0.2693
TREAT         1      0.4224      0.2004        4.4435        0.0350
SITE          1      0.1732      0.2210        0.6144        0.4331

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

AGE           1.056       1.020       1.093
ndrgfp1       2.668       1.515       4.700
ndrgfp2       1.435       1.157       1.780
IVHX2         0.544       0.307       0.962
IVHX3         0.485       0.294       0.800
RACE          1.281       0.826       1.988
TREAT         1.526       1.030       2.259
SITE          1.189       0.771       1.834

Association of Predicted Probabilities and Observed Responses

Percent Concordant     67.2    Somers' D    0.348
Percent Discordant     32.4    Gamma        0.349
Percent Tied            0.5    Tau-a        0.133
Pairs                 62916    c            0.674

page 115 Table 4.9 Preliminary final model containing significant main effects and interactions.
proc logistic data=uis43 desc;
  model dfree = age ndrgfp1 ndrgfp2 ivhx2 ivhx3 race treat site age*ndrgfp1 race*site;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS43
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        619.963
SC               660.083        667.861
-2 Log L         653.729               597.963

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        55.7660       10         <.0001
Score                   52.0723       10         <.0001
Wald                    47.2784       10         <.0001

The LOGISTIC Procedure

              Analysis of Maximum Likelihood Estimates

                                 Standard
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept       1     -6.8429      1.2193       31.4989        <.0001
AGE             1      0.1166      0.0289       16.3137        <.0001
ndrgfp1         1      1.6687      0.4071       16.8000        <.0001
ndrgfp2         1      0.4336      0.1169       13.7585        0.0002
IVHX2           1     -0.6346      0.2987        4.5134        0.0336
IVHX3           1     -0.7049      0.2616        7.2623        0.0070
RACE            1      0.6841      0.2641        6.7074        0.0096
TREAT           1      0.4349      0.2038        4.5559        0.0328
SITE            1      0.5162      0.2549        4.1013        0.0429
AGE*ndrgfp1     1     -0.0153     0.00603        6.4177        0.0113
RACE*SITE       1     -1.4294      0.5298        7.2799        0.0070

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

ndrgfp2       1.543       1.227       1.940
IVHX2         0.530       0.295       0.952
IVHX3         0.494       0.296       0.825
TREAT         1.545       1.036       2.303

Association of Predicted Probabilities and Observed Responses

Percent Concordant     69.7    Somers' D    0.398
Percent Discordant     29.9    Gamma        0.399
Percent Tied            0.4    Tau-a        0.152
Pairs                 62916    c            0.699

4.3 Stepwise logistic regression

page 123 Table 4.11 Log-likelihood for the model at each step and likelihood ratio test statistics (G), degrees-of-freedom (df), and p-values for two methods of selecting variables for a final model from a summary table.

NOTE: The following code gives the log likelihood and the values for method 1.
proc logistic data=uis43 desc;
  model dfree = ;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS43
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

-2 Log L = 653.7289

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -1.0687      0.0956      124.9675        <.0001


proc logistic data=uis43 desc;
  model dfree = ndrugtx;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS43
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        645.890
SC               660.083        654.598
-2 Log L         653.729               641.890

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        11.8392        1         0.0006
Score                    9.7585        1         0.0018
Wald                     9.2203        1         0.0024

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -0.7678      0.1303       34.7133        <.0001
NDRUGTX       1     -0.0749      0.0247        9.2203        0.0024

The LOGISTIC Procedure

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

NDRUGTX       0.928       0.884       0.974

Association of Predicted Probabilities and Observed Responses

Percent Concordant     54.6    Somers' D    0.203
Percent Discordant     34.3    Gamma        0.228
Percent Tied           11.1    Tau-a        0.077
Pairs                 62916    c            0.602

NOTE: To get the value of G, you need to compare the two models by doing some calculations by hand:
-2*(-326.864-(-320.945)) = 11.84
proc logistic data=uis43 desc;
  model dfree = ndrugtx treat;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS43
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        642.860
SC               660.083        655.923
-2 Log L         653.729               636.860

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        16.8690        2         0.0002
Score                   14.8924        2         0.0006
Wald                    14.2225        2         0.0008

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -0.9991      0.1691       34.9214        <.0001
NDRUGTX       1     -0.0739      0.0245        9.1221        0.0025
TREAT         1      0.4348      0.1948        4.9830        0.0256

          Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

NDRUGTX       0.929       0.885       0.974
TREAT         1.545       1.054       2.263

Association of Predicted Probabilities and Observed Responses

Percent Concordant     58.8    Somers' D    0.232
Percent Discordant     35.5    Gamma        0.246
Percent Tied            5.7    Tau-a        0.089
Pairs                 62916    c            0.616

-2*(-320.945-(-318.430)) = 5.03
proc logistic data=uis43 desc;
  model dfree = ndrugtx treat ivhx2 ivhx3;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS43
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        640.050
SC               660.083        661.822
-2 Log L         653.729               630.050

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        23.6784        4         <.0001
Score                   22.3908        4         0.0002
Wald                    21.3059        4         0.0003

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -0.7714      0.1878       16.8787        <.0001
NDRUGTX       1     -0.0542      0.0246        4.8559        0.0276
TREAT         1      0.4215      0.1965        4.6009        0.0320
IVHX2         1     -0.4024      0.2711        2.2040        0.1377
IVHX3         1     -0.5804      0.2289        6.4281        0.0112

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

NDRUGTX       0.947       0.903       0.994
TREAT         1.524       1.037       2.241
IVHX2         0.669       0.393       1.138
IVHX3         0.560       0.357       0.877

Association of Predicted Probabilities and Observed Responses

Percent Concordant     62.2    Somers' D    0.269
Percent Discordant     35.3    Gamma        0.276
Percent Tied            2.5    Tau-a        0.103
Pairs                 62916    c            0.635

-2*(-318.430-(-315.025)) = 6.81
proc logistic data=uis43 desc;
  model dfree = ndrugtx treat ivhx2 ivhx3 age ;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS43
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates

AIC              655.729        632.587
SC               660.083        658.713
-2 Log L         653.729               620.587

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        33.1420        5         <.0001
Score                   31.1565        5         <.0001
Wald                    29.3324        5         <.0001

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -2.3327      0.5484       18.0956        <.0001
NDRUGTX       1     -0.0637      0.0256        6.1858        0.0129
TREAT         1      0.4513      0.1986        5.1649        0.0230
IVHX2         1     -0.6237      0.2847        4.7989        0.0285
IVHX3         1     -0.8056      0.2445       10.8542        0.0010
AGE           1      0.0526      0.0172        9.3378        0.0022

           Odds Ratio Estimates

              Point          95% Wald
Effect     Estimate      Confidence Limits

NDRUGTX       0.938       0.892       0.987
TREAT         1.570       1.064       2.318
IVHX2         0.536       0.307       0.936
IVHX3         0.447       0.277       0.722
AGE           1.054       1.019       1.090

Association of Predicted Probabilities and Observed Responses

Percent Concordant     65.5    Somers' D    0.315
Percent Discordant     34.0    Gamma        0.317
Percent Tied            0.5    Tau-a        0.120
Pairs                 62916    c            0.658

-2*(-315.025-(-310.293)) = 9.46

NOTE: The following code gives the log likelihood and the values for method 2.

-2*(-326.864-(-310.293)) = 33.14

-2*(-320.945-(-310.293)) = 21.30

-2*(-318.430-(-310.293)) = 16.27

-2*(-315.025-(-310.293)) = 9.46


page 126 Table 4.12 Results of applying stepwise variable selection using the score test to select and maximum likelihood test to remove covariates at each step to the UIS data. Results are presented at each step in terms of the p-values to enter (below the horizontal line), and the p-value to remove (above the horizontal line) in each column. The asterisk denotes the maximum p-value to remove at each step.

proc logistic data=uis43 desc;
  class ivhx;
  model dfree = ivhx age ndrugtx treat race site beck / selection=stepwise slentry=0.15 slstay=0.20 details;
run;

The LOGISTIC Procedure

              Model Information

Data Set                      WORK.UIS43
Response Variable             DFREE
Number of Response Levels     2
Number of Observations        575
Link Function                 Logit
Optimization Technique        Fisher's scoring

          Response Profile

 Ordered                      Total
   Value        DFREE     Frequency

       1            1           147
       2            0           428

Stepwise Selection Procedure

   Class Level Information

                      Design
                    Variables

Class     Value      1      2

IVHX      1          1      0
          2          0      1
          3         -1     -1

Step  0. Intercept entered:

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

             Analysis of Maximum Likelihood Estimates

                                 Standard
Parameter      DF    Estimate       Error    Chi-Square    Pr  ChiSq

Intercept       1     -1.0687      0.0956      124.9675        .0001

The LOGISTIC Procedure

     Residual Chi-Square Test

Chi-Square       DF     Pr  ChiSq   
  32.6798        8         .0001

   Analysis of Effects Not in the Model

                        Score
Effect       DF    Chi-Square    Pr  ChiSq
IVHX          2       13.4161       0.0012 

AGE           1        1.4063       0.2357 

NDRUGTX       1        9.7585       0.0018 

TREAT         1        5.1626       0.0231 

RACE          1        4.7791       0.0288 

SITE          1        1.6921       0.1933 

BECK          1        0.6331       0.4262 
  
 
 
 Step  1. Effect IVHX entered:

                   Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates
AIC              655.729        646.376
SC               660.083        659.440
-2 Log L         653.729        640.376

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq
Likelihood Ratio        13.3525        2         0.0013
Score                   13.4161        2         0.0012
Wald                    13.1585        2         0.0014

The LOGISTIC Procedure

       Type III Analysis of Effects

Wald Effect   DF    Chi-Square    Pr > ChiSq
IVHX          2       13.1585        0.0014

              Analysis of Maximum Likelihood Estimates

                                 Standard
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept       1     -1.0983      0.1040      111.4532        <.0001
IVHX      1     1      0.4186      0.1324       10.0021        0.0016
IVHX      2     1     -0.0624      0.1663        0.1408        0.7075

               Odds Ratio Estimates

                     Point          95% Wald Effect           
                     Estimate      Confidence Limits
IVHX    1 vs 3       2.170       1.420       3.318
IVHX    2 vs 3       1.342       0.778       2.314

Association of Predicted Probabilities and Observed Responses

Percent Concordant     41.5    Somers' D    0.185
Percent Discordant     23.0    Gamma        0.287
Percent Tied           35.5    Tau-a        0.071
Pairs                 62916    c            0.593

     Residual Chi-Square Test

Chi-Square       DF     Pr > ChiSq   
20.1460        6         0.0026

       Analysis of Effects in Model

                         Wald
Effect       DF    Chi-Square    Pr > ChiSq
IVHX          2       13.1585       0.0014 
  
 
 
 The LOGISTIC Procedure

   Analysis of Effects Not in the Model

                        Score
Effect       DF    Chi-Square    Pr > ChiSq
AGE           1        7.3328       0.0068 

NDRUGTX       1        4.9318       0.0264 

TREAT         1        4.5504       0.0329 

RACE          1        2.1112       0.1462 

SITE          1        0.5585       0.4549 

BECK          1        0.0824       0.7741 
  
 
 
 Step  2. Effect AGE entered:

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates
AIC              655.729        641.096
SC               660.083        658.514
-2 Log L         653.729        633.096

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq
Likelihood Ratio        20.6325        3         0.0001
Score                   20.4581        3         0.0001
Wald                    19.7426        3         0.0002

       Type III Analysis of Effects

                         Wald
Effect       DF    Chi-Square    Pr > ChiSq
IVHX          2       18.6217        <.0001
AGE           1        7.2173        0.0072

The LOGISTIC Procedure

              Analysis of Maximum Likelihood Estimates

                                 Standard
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept       1     -2.5942      0.5727       20.5193        <.0001
IVHX      1     1      0.5610      0.1446       15.0424        0.0001
IVHX      2     1     -0.1200      0.1691        0.5037        0.4779
AGE             1      0.0454      0.0169        7.2173        0.0072

              Odds Ratio Estimates

                     Point          95% Wald
Effect            Estimate      Confidence Limits
IVHX    1 vs 3       2.724       1.716       4.322
IVHX    2 vs 3       1.378       0.796       2.388
AGE                  1.046       1.012       1.082

Association of Predicted Probabilities and Observed Responses

Percent Concordant     60.7    Somers' D    0.239
Percent Discordant     36.8    Gamma        0.245
Percent Tied            2.5    Tau-a        0.091
Pairs                 62916    c            0.620

     Residual Chi-Square Test

Chi-Square       DF     Pr > ChiSq
  12.8529        5         0.0248

       Analysis of Effects in Model

                         Wald
Effect       DF    Chi-Square    Pr > ChiSq
IVHX          2       18.6217       <.0001 

AGE           1        7.2173       0.0072 
  

 
Analysis of Effects Not in the Model

                        Score
Effect       DF    Chi-Square    Pr > ChiSq
NDRUGTX       1        6.2094       0.0127 

TREAT         1        5.0083       0.0252 

RACE          1        1.4228       0.2330 

SITE          1        0.5078       0.4761 

BECK          1        0.0021       0.9636
 
Step  3. Effect NDRUGTX entered:

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
               Intercept         and
Criterion        Only        Covariates
AIC              655.729        635.805
SC               660.083        657.577
-2 Log L         653.729        625.805

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq
Likelihood Ratio        27.9241        4         <.0001
Score                   26.1214        4         <.0001
Wald                    24.7400        4         <.0001

       Type III Analysis of Effects

                         Wald
Effect       DF    Chi-Square    Pr > ChiSq
IVHX          2       11.8349        0.0027
AGE           1        8.7808        0.0030
NDRUGTX       1        6.0226        0.0141

The LOGISTIC Procedure

              Analysis of Maximum Likelihood Estimates

                                 Standard
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept       1     -2.5107      0.5759       19.0072        <.0001
IVHX      1     1      0.4699      0.1484       10.0285               0.0015 

IVHX      2     1     -0.1201      0.1705        0.4958        0.4813
AGE             1      0.0508      0.0171        8.7808               0.0030 

NDRUGTX         1     -0.0632      0.0258        6.0226               0.0141 

  

  Odds Ratio Estimates

                     Point          95% Wald
Effect            Estimate      Confidence Limits
IVHX    1 vs 3       2.270       1.408       3.659
IVHX    2 vs 3       1.258       0.721       2.195
AGE                  1.052       1.017       1.088
NDRUGTX              0.939       0.893       0.987

Association of Predicted Probabilities and Observed Responses

Percent Concordant     64.2    Somers' D    0.291
Percent Discordant     35.1    Gamma        0.293
Percent Tied            0.7    Tau-a        0.111
Pairs                 62916    c            0.646

     Residual Chi-Square Test

Chi-Square       DF     Pr > ChiSq    
   6.5523        4         0.1615

       Analysis of Effects in Model

                         Wald
Effect       DF    Chi-Square    Pr > ChiSq
IVHX          2       11.8349        0.0027
AGE           1        8.7808        0.0030
NDRUGTX       1        6.0226        0.0141

The LOGISTIC Procedure

   Analysis of Effects Not in the Model

                        Score
Effect       DF    Chi-Square    Pr > ChiSq
TREAT         1        5.2017       0.0226 

RACE          1        1.2039       0.2726 

SITE          1        0.2416       0.6231 

BECK          1        0.0011       0.9738 
Step  4. Effect TREAT entered:

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

         Model Fit Statistics

                              Intercept
    &nb