|
|
|
||||
|
|
|||||
Example 2: We wish to study the influence of age, gender and exercise on whether or not someone has a heart attack. Again, we have a binary response variable, whether or not a heart attack occurs.
Example 3: How do variables, such as, GRE (Graduate Record Exam scores), GPA (grade point average), and prestige of the undergraduate program effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.
use http://www.ats.ucla.edu/stat/stata/dae/logit.dta, clear
This hypothetical data set has a binary response (outcome, dependent) variable called admit. There are three predictor variables: gre, gpa and topnotch, which is a binary predictor in which 1 indicates that the undergraduate institution was "top notch" and 0 indicates that it is not.
summarize gre gpa
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
gre | 400 587.7 115.5165 220 800
gpa | 400 3.3899 .3805668 2.26 4
tab topnotch
topnotch | Freq. Percent Cum.
------------+-----------------------------------
0 | 335 83.75 83.75
1 | 65 16.25 100.00
------------+-----------------------------------
Total | 400 100.00
Before running logit, check to see if any cells (created by the crosstab of our categorical and response variables) are empty or particularly small. If this occurs, there may be difficulty running the logit model.
tab admit topnotch
| topnotch
admit | 0 1 | Total
-----------+----------------------+----------
0 | 238 35 | 273
1 | 97 30 | 127
-----------+----------------------+----------
Total | 335 65 | 400
None of the cells are too small or empty (has no cases), so we will run our logit model.
logit admit gre topnotch gpa
Iteration 0: log likelihood = -249.98826
Iteration 1: log likelihood = -239.17277
Iteration 2: log likelihood = -239.06484
Iteration 3: log likelihood = -239.06481
Logistic regression Number of obs = 400
LR chi2(3) = 21.85
Prob > chi2 = 0.0001
Log likelihood = -239.06481 Pseudo R2 = 0.0437
------------------------------------------------------------------------------
admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gre | .0024768 .0010702 2.31 0.021 .0003792 .0045744
topnotch | .4372236 .2918532 1.50 0.134 -.1347983 1.009245
gpa | .6675556 .3252593 2.05 0.040 .0300592 1.305052
_cons | -4.600814 1.096379 -4.20 0.000 -6.749678 -2.451949
------------------------------------------------------------------------------
In the output above, we first see the iteration log, which is generally boring. The log likelihood (-239.06481) can be used in comparisons of nested models, but we won't show an example of that here. Also at the top of the output we see that all 400 observations in our data set were used in the analysis (fewer observations would have been used if any of our variables had missing values). The likelihood ratio chi-square of 21.85 with a p-value of 0.0001 tells us that our model as a whole fits significantly better than an empty model.
In the table we see the coefficients, their standard errors, the z-statistic (sometimes called a Wald z-statistic), associated p-values, and the 95% confidence interval of the coefficients. Both gre and gpa are statistically significant while topnotch is not. The interpretation of the coefficients can be awkward. For example, for a one unit increase in gpa, the log odds of being admitted to graduate school (vs. not being admitted) increases by .667. For this reason, many researchers prefer to exponentiate the coefficients and interpret them as odds-ratios. Stata will do this computation for you if you use the or option, illustrated below.
logit , or
<redundant output omitted to save space>
------------------------------------------------------------------------------
admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gre | 1.00248 .0010729 2.31 0.021 1.000379 1.004585
topnotch | 1.548402 .4519062 1.50 0.134 .8738922 2.74353
gpa | 1.949466 .634082 2.05 0.040 1.030516 3.687881
------------------------------------------------------------------------------
Now we can say that for a one unit increase in gpa, the odds of being admitted to graduate school (vs. not being admitted) increased by a factor of 1.94. Since GRE scores do not increase by a single unit (they increase only in units of 10), a one unit increase is meaningless. We can take the odds ratio and raise it to the 10th power, e.g. 1.00248 ^ 10 = 1.0250786, and say for a 10 unit increase in GRE score, the odds of admission to graduate school increased by a factor of 1.025.
Even odds ratios can be hard to interpret. Instead, you can also use predicted probabilities, which are sometimes easier to understand than the coefficients or odds ratios, to interpret your results. This can be done with a suite of commands, called spost, written by J. Scott Long and Jeremy Freese. The commands must be downloaded prior to their use, and this can be done by typing findit spost9_ado on the Stata command line (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
We will start with prtab. This can be used with either a categorical variable or a continuous variable and shows the predicted probability of the outcome being 1 for all levels of the specified predictor. Although topnotch is not statistically significant, we will use it as an example with a categorical predictor. As you can see, the predicted probability of being accepted into the graduate program is 0.29 if the undergraduate institution was not "top notch" and .39 if it was, while gre and gpa are held constant at their mean value.
prtab topnotch
logit: Predicted probabilities of positive outcome for admit
----------------------
topnotch | Prediction
----------+-----------
0 | 0.2927
1 | 0.3905
----------------------
gre topnotch gpa
x= 587.7 .1625 3.3899
Below we can see that the predicted probability of getting accepted is only .15 if one's GRE score is 220 and increases to .429 if one's GRE score is 800 (while gpa and topnotch are held constant at their mean, indicated at the end of the output).
prtab grelogit: Predicted probabilities of positive outcome for admit ---------------------- gre | Prediction ----------+----------- 220 | 0.1516 300 | 0.1789 340 | 0.1939 360 | 0.2018 380 | 0.2099 400 | 0.2182 420 | 0.2268 440 | 0.2356 460 | 0.2446 480 | 0.2539 500 | 0.2634 520 | 0.2731 540 | 0.2831 560 | 0.2932 580 | 0.3036 600 | 0.3142 620 | 0.3249 640 | 0.3359 660 | 0.3470 680 | 0.3583 700 | 0.3698 720 | 0.3814 740 | 0.3932 760 | 0.4051 780 | 0.4171 800 | 0.4291 ---------------------- gre topnotch gpa x= 587.7 .1625 3.3899
We can use the prvalue command to obtain the predicted probabilities when gpa is set to specific values: 2, 3 and 4. As you can see, when one's GPA is 2, the predicted probability of being accepted is only .149, and .85 of not being accepted. When GPA is 3, the probability of being accepted increases to .255, and when one's GPA is 4, the predicted probability of being accepted is .40.
prvalue , x(gpa=2)
logit: Predictions for admit
Confidence intervals by delta method
95% Conf. Interval
Pr(y=1|x): 0.1494 [ 0.0310, 0.2679]
Pr(y=0|x): 0.8506 [ 0.7321, 0.9690]
gre topnotch gpa
x= 587.7 .1625 2
prvalue , x(gpa=3)
logit: Predictions for admit
Confidence intervals by delta method
95% Conf. Interval
Pr(y=1|x): 0.2551 [ 0.1893, 0.3209]
Pr(y=0|x): 0.7449 [ 0.6791, 0.8107]
gre topnotch gpa
x= 587.7 .1625 3
prvalue , x(gpa=4)
logit: Predictions for admit
Confidence intervals by delta method
95% Conf. Interval
Pr(y=1|x): 0.4004 [ 0.2973, 0.5034]
Pr(y=0|x): 0.5996 [ 0.4966, 0.7027]
gre topnotch gpa
x= 587.7 .1625 4
We will use the estout command to create a table of the results that might be more appropriate for publication. This command is user-written, so type findit estout to download it (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
estout, eform drop(_cons) collabels(OR) varwidth(12) cells(b(star fmt(%8.4f)) /// se(par fmt(%8.4f))) /// stats(ll chi2, labels("Log Likelihood" "LR Chi Square" ) fmt(%8.2f))OR gre 1.0025* (0.0011) topnotch 1.5484 (0.4519) gpa 1.9495* (0.6341) Log Likelihood -239.06 LR Chi Square 21.85
Below is one way of describing these results.
A logit regression was used to predict admission to graduate school from GRE score, GPA, and whether the student was from a top notch university. GRE score and GPA were significant predictors of admission to graduate school, but being from a top notch university was not related to admission to graduate school. For every one unit increase in GPA, the odds of admission (vs. non-admission) increased by a factor of 1.95, while for every ten unit increase in GRE score, such odds increased by a factor of 1.025. These findings can also be interpreted using predicted probabilities. With all other variables held constant at their mean, the probability of admission for a GPA of 2.0 was .15, while a GPA of 3.0 resulted in a .26 probability of admission and a GPA of 4.0 was associated with a .40 probability of admission. Likewise, for GRE scores of 400, 500, 600 and 700, the probabilities of admission were .22, .26, .31 and .37, respectively, while holding other predictors constant at their mean.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services