UCLA Academic Technology Services HomeServicesClassesContactJobs

Stata Data Analysis Examples
Logit Regression

Examples

Example 1:  Suppose that we are interested in the factors that influence whether or not a political candidate wins an election.  The outcome (response) variable is binary (0/1);  win or lose.  The predictor variables of interest are: the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent.  Because the response variable is binary we need to use a model that handles 0/1 variables correctly. 

Example 2:  We wish to study the influence of age, gender and exercise on whether or not someone has a heart attack.  Again, we have a binary response variable, whether or not a heart attack occurs. 

Example 3:  How do variables, such as, GRE (Graduate Record Exam scores), GPA (grade point average), and prestige of the undergraduate program effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.

Description of the Data

For our data analysis below, we are going to expand on Example 3 about getting into graduate school.  We have generated hypothetical data, which can be obtained from our website:
use http://www.ats.ucla.edu/stat/stata/dae/logit.dta, clear

This hypothetical data set has a binary response (outcome, dependent) variable called admit. There are three predictor variables:  gre, gpa and topnotch, which is a binary predictor in which 1 indicates that the undergraduate institution was "top notch" and 0 indicates that it is not. 

summarize gre gpa

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         gre |       400       587.7    115.5165        220        800
         gpa |       400      3.3899    .3805668       2.26          4

tab topnotch

   topnotch |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        335       83.75       83.75
          1 |         65       16.25      100.00
------------+-----------------------------------
      Total |        400      100.00

Some Strategies You Might Try

Using the Logit Model

Before running logit, check to see if any cells (created by the crosstab of our categorical and response variables) are empty or particularly small.  If this occurs, there may be difficulty running the logit model. 

tab admit topnotch

           |       topnotch
     admit |         0          1 |     Total
-----------+----------------------+----------
         0 |       238         35 |       273 
         1 |        97         30 |       127 
-----------+----------------------+----------
     Total |       335         65 |       400 

None of the cells are too small or empty (has no cases), so we will run our logit model.

logit admit gre topnotch gpa

Iteration 0:   log likelihood = -249.98826
Iteration 1:   log likelihood = -239.17277
Iteration 2:   log likelihood = -239.06484
Iteration 3:   log likelihood = -239.06481

Logistic regression                               Number of obs   =        400
                                                  LR chi2(3)      =      21.85
                                                  Prob > chi2     =     0.0001
Log likelihood = -239.06481                       Pseudo R2       =     0.0437

------------------------------------------------------------------------------
       admit |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         gre |   .0024768   .0010702     2.31   0.021     .0003792    .0045744
    topnotch |   .4372236   .2918532     1.50   0.134    -.1347983    1.009245
         gpa |   .6675556   .3252593     2.05   0.040     .0300592    1.305052
       _cons |  -4.600814   1.096379    -4.20   0.000    -6.749678   -2.451949
------------------------------------------------------------------------------

In the output above, we first see the iteration log, which is generally boring. The log likelihood (-239.06481) can be used in comparisons of nested models, but we won't show an example of that here.  Also at the top of the output we see that all 400 observations in our data set were used in the analysis (fewer observations would have been used if any of our variables had missing values). The likelihood ratio chi-square of 21.85 with a p-value of 0.0001 tells us that our model as a whole fits significantly better than an empty model. 

In the table we see the coefficients, their standard errors, the z-statistic (sometimes called a Wald z-statistic), associated p-values, and the 95% confidence interval of the coefficients.  Both gre and gpa are statistically significant while topnotch is not. The interpretation of the coefficients can be awkward. For example, for a one unit increase in gpa, the log odds of being admitted to graduate school (vs. not being admitted) increases by .667. For this reason, many researchers prefer to exponentiate the coefficients and interpret them as odds-ratios. Stata will do this computation for you if you use the or option, illustrated below.

logit , or
<redundant output omitted to save space>
------------------------------------------------------------------------------
       admit | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         gre |    1.00248   .0010729     2.31   0.021     1.000379    1.004585
    topnotch |   1.548402   .4519062     1.50   0.134     .8738922     2.74353
         gpa |   1.949466    .634082     2.05   0.040     1.030516    3.687881
------------------------------------------------------------------------------

Now we can say that for a one unit increase in gpa, the odds of being admitted to graduate school (vs. not being admitted) increased by a factor of 1.94. Since GRE scores do not increase by a single unit (they increase only in units of 10), a one unit increase is meaningless. We can take the odds ratio and raise it to the 10th power, e.g. 1.00248 ^ 10 = 1.0250786, and say for a 10 unit increase in GRE score, the odds of admission to graduate school increased by a factor of 1.025.

Even odds ratios can be hard to interpret. Instead, you can also use predicted probabilities, which are sometimes easier to understand than the coefficients or odds ratios, to interpret your results. This can be done with a suite of commands, called spost, written by J. Scott Long and Jeremy Freese. The commands must be downloaded prior to their use, and this can be done by typing findit spost9_ado on the Stata command line (see How can I use the findit command to search for programs and get additional help? for more information about using findit).

We will start with prtab.  This can be used with either a categorical variable or a continuous variable and shows the predicted probability of the outcome being 1 for all levels of the specified predictor.  Although topnotch is not statistically significant, we will use it as an example with a categorical predictor. As you can see, the predicted probability of being accepted into the graduate program is 0.29 if the undergraduate institution was not "top notch" and .39 if it was, while gre and gpa are held constant at their mean value.

prtab topnotch

logit: Predicted probabilities of positive outcome for admit

----------------------
 topnotch | Prediction
----------+-----------
        0 |     0.2927
        1 |     0.3905
----------------------

         gre  topnotch       gpa
x=     587.7     .1625    3.3899

Below we can see that the predicted probability of getting accepted is only .15 if one's GRE score is 220 and increases to .429 if one's GRE score is 800 (while gpa and topnotch are held constant at their mean, indicated at the end of the output). 

prtab gre
logit: Predicted probabilities of positive outcome for admit

----------------------
      gre | Prediction
----------+-----------
      220 |     0.1516
      300 |     0.1789
      340 |     0.1939
      360 |     0.2018
      380 |     0.2099
      400 |     0.2182
      420 |     0.2268
      440 |     0.2356
      460 |     0.2446
      480 |     0.2539
      500 |     0.2634
      520 |     0.2731
      540 |     0.2831
      560 |     0.2932
      580 |     0.3036
      600 |     0.3142
      620 |     0.3249
      640 |     0.3359
      660 |     0.3470
      680 |     0.3583
      700 |     0.3698
      720 |     0.3814
      740 |     0.3932
      760 |     0.4051
      780 |     0.4171
      800 |     0.4291
----------------------

         gre  topnotch       gpa
x=     587.7     .1625    3.3899

We can use the prvalue command to obtain the predicted probabilities when gpa is set to specific values: 2, 3 and 4.  As you can see, when one's GPA is 2, the predicted probability of being accepted is only .149, and .85 of not being accepted.  When GPA is 3, the probability of being accepted increases to .255, and when one's GPA is 4, the predicted probability of being accepted is .40.

prvalue , x(gpa=2)

logit: Predictions for admit

Confidence intervals by delta method

                                95% Conf. Interval
  Pr(y=1|x):          0.1494   [ 0.0310,    0.2679]
  Pr(y=0|x):          0.8506   [ 0.7321,    0.9690]

         gre  topnotch       gpa
x=     587.7     .1625         2

prvalue , x(gpa=3)

logit: Predictions for admit

Confidence intervals by delta method

                                95% Conf. Interval
  Pr(y=1|x):          0.2551   [ 0.1893,    0.3209]
  Pr(y=0|x):          0.7449   [ 0.6791,    0.8107]

         gre  topnotch       gpa
x=     587.7     .1625         3

prvalue , x(gpa=4)

logit: Predictions for admit

Confidence intervals by delta method

                                95% Conf. Interval
  Pr(y=1|x):          0.4004   [ 0.2973,    0.5034]
  Pr(y=0|x):          0.5996   [ 0.4966,    0.7027]

         gre  topnotch       gpa
x=     587.7     .1625         4 

Sample Write-up of the Analysis

We will use the estout command to create a table of the results that might be more appropriate for publication.  This command is user-written, so type findit estout to download it (see How can I use the findit command to search for programs and get additional help? for more information about using findit).

estout, eform drop(_cons) collabels(OR) varwidth(12) cells(b(star fmt(%8.4f)) ///
  se(par fmt(%8.4f))) ///
  stats(ll chi2, labels("Log Likelihood" "LR Chi Square" ) fmt(%8.2f))
                OR
gre             1.0025*
                (0.0011)
topnotch        1.5484
                (0.4519)
gpa             1.9495*
                (0.6341)
Log Likelihood  -239.06
LR Chi Square   21.85

Below is one way of describing these results.

A logit regression was used to predict admission to graduate school from GRE score, GPA, and whether the student was from a top notch university. GRE score and GPA were significant predictors of admission to graduate school, but being from a top notch university was not related to admission to graduate school. For every one unit increase in GPA, the odds of admission (vs. non-admission) increased by a factor of 1.95, while for every ten unit increase in GRE score, such odds increased by a factor of 1.025. These findings can also be interpreted using predicted probabilities. With all other variables held constant at their mean, the probability of admission for a GPA of 2.0 was .15, while a GPA of 3.0 resulted in a .26 probability of admission and a GPA of 4.0 was associated with a .40 probability of admission. Likewise, for GRE scores of 400, 500, 600 and 700, the probabilities of admission were .22, .26, .31 and .37, respectively, while holding other predictors constant at their mean.

Cautions, Flies in the Ointment

Additional Examples

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.