|
|
|
||||
|
|
|||||
Example 2: We wish to study the influence of age, gender and exercise on whether or not someone has a heart attack. Again, we have a binary response variable, whether or not a heart attack occurs.
Example 3: How do variables, such as, GRE (Graduate Record Exam scores), GPA (grade point average), and prestige of the undergraduate program effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.
This hypothetical data set has a binary response (outcome, dependent) variable called admit. There are three predictor variables: gre, gpa and topnotch, which is a binary predictor in which 1 indicates that the undergraduate institution was "top notch" and 0 indicates that it is not.
proc means data="c:\data\logit"; var gre gpa; run;The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- GRE 400 587.7000000 115.5165364 220.0000000 800.0000000 GPA 400 3.3899000 0.3805668 2.2600000 4.0000000 -------------------------------------------------------------------------------proc freq data="c:\data\logit"; tables topnotch; run;The FREQ Procedure Cumulative Cumulative TOPNOTCH Frequency Percent Frequency Percent ------------------------------------------------------------- 0 335 83.75 335 83.75 1 65 16.25 400 100.00
Before running logit, check to see if any cells (created by the crosstab of our categorical and response variables) are empty or particularly small. If this occurs, there may be difficulty running the logit model.
proc freq data="c:\data\logit"; tables admit*topnotch / norow nocol nopercent; run;The FREQ Procedure Table of ADMIT by TOPNOTCH ADMIT TOPNOTCH Frequency| 0| 1| Total ---------+--------+--------+ 0 | 238 | 35 | 273 ---------+--------+--------+ 1 | 97 | 30 | 127 ---------+--------+--------+ Total 335
None of the cells are too small or empty (has no cases), so we will run our logit model.
proc logistic data="c:\data\logit" descending; model admit = gre topnotch gpa; run;The LOGISTIC Procedure Model Information Data Set c:\data\logit Written by SAS Response Variable ADMIT Number of Response Levels 2 Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 400 Number of Observations Used 400Response Profile Ordered Total Value ADMIT Frequency 1 1 127 2 0 273 Probability modeled is ADMIT=1.
This output tells us the file being analyzed and the number of observations used. We see that all 400 observations in our data set were used in the analysis (fewer observations would have been used if any of our variables had missing values). We also see that SAS is modeling admit as a binary logit and is modeling admit being 1 (If we omitted the descending option, SAS would model admit being 0 and our results would be completely reversed).
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept and
Criterion Only Covariates
AIC 501.977 486.130
SC 505.968 502.095
-2 Log L 499.977 478.130
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 21.8469 3 <.0001
Score 21.5235 3 <.0001
Wald 20.4017 3 0.0001
This output describes the overall fit of the model, and tests the overall fit of the model. This is usually boring. The -2 Log L (499.977) can be used in comparisons of nested models, but we won't show an example of that here. The likelihood ratio chi-square of 21.8469 with a p-value of 0.0001 tells us that our model as a whole fits significantly better than an empty model.
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -4.6008 1.0964 17.6095 <.0001
GRE 1 0.00248 0.00107 5.3560 0.0207
TOPNOTCH 1 0.4372 0.2919 2.2443 0.1341
GPA 1 0.6676 0.3253 4.2123 0.0401
The above table gets into the heart of the results. It shows the coefficients (estimate), their standard errors, the Wald Chi-Square statistic, and associated p-values. Both gre and gpa are statistically significant while topnotch is not. The interpretation of the coefficients can be awkward. For example, for a one unit increase in gpa, the log odds of being admitted to graduate school (vs. not being admitted) increases by .667. For this reason, many researchers prefer to exponentiate the coefficients and interpret them as odds-ratios, as shown in the next segment of output.
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
GRE 1.002 1.000 1.005
TOPNOTCH 1.548 0.874 2.744
GPA 1.949 1.031 3.688
Association of Predicted Probabilities and Observed Responses
Percent Concordant 63.9 Somers' D 0.283
Percent Discordant 35.6 Gamma 0.285
Percent Tied 0.5 Tau-a 0.123
Pairs 34671 c 0.642
Now we can say that for a one unit increase in gpa, the odds of being admitted to graduate school (vs. not being admitted) increased by a factor of 1.94. Since GRE scores do not increase by a single unit (they increase only in units of 10), a one unit increase is meaningless. We can take the odds ratio and raise it to the 10th power, e.g. 1.002 ^ 10 = 1.02, and say for a 10 unit increase in GRE score, the odds of admission to graduate school increased by a factor of 1.02. To get a more accurate estimate, we can re-run the PROC LOGISTIC using the units statement, as shown below.
proc logistic data="c:\data\logit" descending; model admit = gre topnotch gpa; units gre = 10; run;
The following table will be included in the output, giving the more precise estimate that for a 10 unit change in GRE, the odds ratio is 1.025.
Adjusted Odds Ratios Effect Unit Estimate GRE 10.0000 1.025
Below is one way of describing these results.
A logit regression was used to predict admission to graduate school from GRE score, GPA, and whether the student was from a top notch university. GRE score and GPA were significant predictors of admission to graduate school, but being from a top notch university was not related to admission to graduate school. For every one unit increase in GPA, the odds of admission (vs. non-admission) increased by a factor of 1.95, while for every ten unit increase in GRE score, such odds increased by a factor of 1.025.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services