UCLA Academic Technology Services HomeServicesClassesContactJobs

Stata Data Analysis Examples
Exact Logistic Regression

Note: This data analysis example requires Stata 10 or later.

Example:

Suppose that we are interested in the factors that influence whether or not a high school senior is admitted into a very competitive engineering school.  The outcome (response) variable is binary (0/1);  admit or not admit.  The predictor variables of interest are: student gender, and whether or not the student took AP calculus in high school.  Because the response variable is binary we need to use a model that handles 0/1 variables correctly. And, because of the number of students involved is small, we will need a procedure that can perform the estimation with a small sample size. 

Description of the Data

The data for this exact logistic data analysis includes the number admitted and the total number of applicants broken down by gender (female) and whether or not they had taken AP calculus (apcalc). Since the dataset is so small that we will read it in directly. We will use admit and n to compute the number who were not admitted, noadmit
input female  apcalc    admit       n
        0        0        0        12
        0        1        4         8
        1        0        1         5
        1        1        7         7
end

generate noadmit = n - admit

Let's look at some frequency tables.

tab1 female apcalc [fw=n]

-> tabulation of female  

     female |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         20       62.50       62.50
          1 |         12       37.50      100.00
------------+-----------------------------------
      Total |         32      100.00

-> tabulation of apcalc  

     apcalc |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         17       53.12       53.12
          1 |         15       46.88      100.00
------------+-----------------------------------
      Total |         32      100.00

tabulate female apcalc [fw=admit]

           |        apcalc
    female |         0          1 |     Total
-----------+----------------------+----------
         0 |         0          4 |         4 
         1 |         1          7 |         8 
-----------+----------------------+----------
     Total |         1         11 |        12 

tabulate female apcalc [fw=noadmit]

           |        apcalc
    female |         0          1 |     Total
-----------+----------------------+----------
         0 |        12          4 |        16 
         1 |         4          0 |         4 
-----------+----------------------+----------
     Total |        16          4 |        20 

tabstat noadmit admit, by(female) stat(sum)

Summary statistics: sum
  by categories of: female 

  female |   noadmit     admit
---------+--------------------
       0 |        16         4
       1 |         4         8
---------+--------------------
   Total |        20        12

tabstat noadmit admit, by(apcalc) stat(sum)

Summary statistics: sum
  by categories of: apcalc 

  apcalc |   noadmit     admit
---------+--------------------
       0 |        16         1
       1 |         4        11
---------+--------------------
   Total |        20        12


egen groups = group(female apcalc), label

tabstat noadmit admit, by(groups) stat(sum)

Summary statistics: sum
  by categories of: groups (group(female apcalc))

groups |   noadmit     admit
-------+--------------------
   0 0 |        12         0
   0 1 |         4         4
   1 0 |         4         1
   1 1 |         0         7
-------+--------------------
 Total |        20        12

The tables reveal that 32 people applied for the Engineering program, of which, 12 were admitted and 20 were denied admission. There were 20 male and 12 female applicants. Fifteen of the applicants had taken AP calculus and 17 had not. What is really interesting is that all of the females with AP calculus were admitted versus only half the males. Also, only males with AP calculus were admitted while one female without the course was admitted..

Some Strategies You Might Try

Using the Exact Logistic Model

For the fun of it, let's see what happens when you try a regular logistic regression using the blogit command. 

blogit admit n female apcalc, nolog

Logistic regression for grouped data              Number of obs   =         32
                                                  LR chi2(2)      =      26.25
                                                  Prob > chi2     =     0.0000
Log likelihood = -8.0471896                       Pseudo R2       =     0.6199

------------------------------------------------------------------------------
    _outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   18.60808   1.322876    14.07   0.000     16.01529    21.20087
      apcalc |   19.99437          .        .       .            .           .
       _cons |  -19.99437   .7071068   -28.28   0.000    -21.38027   -18.60847
------------------------------------------------------------------------------
Note: 12 failures and 7 successes completely determined.

Note the 12 failures and 7 successes completely determined, the missing standard error for apclac, and the very large estimates of the coefficients.

Now, let's run the exact logistic analysis using the exlogistic command.

exlogistic admit female apcalc, coef binomial(n) nolog

note: CMLE estimate for female is +inf; computing MUE
note: CMLE estimate for apcalc is +inf; computing MUE

Exact logistic regression                        Number of obs =        32
Binomial variable: n                             Model score   =  18.75176
                                                 Pr >= score   =    0.0000
---------------------------------------------------------------------------
       admit |      Coef.       Suff.  2*Pr(Suff.)     [95% Conf. Interval]
-------------+-------------------------------------------------------------
      female |   2.336592*          8      0.0302      .2044942       +Inf
      apcalc |   3.435807*         11      0.0003      1.405934       +Inf
---------------------------------------------------------------------------
(*) median unbiased estimates (MUE)

/* rerun to obtain odds ratios */

exlogistic

Exact logistic regression                        Number of obs =        32
Binomial variable: n                             Model score   =  18.75176
                                                 Pr >= score   =    0.0000
---------------------------------------------------------------------------
       admit | Odds Ratio       Suff.  2*Pr(Suff.)     [95% Conf. Interval]
-------------+-------------------------------------------------------------
      female |   10.34592*          8      0.0302      1.226904       +Inf
      apcalc |   31.05645*         11      0.0003      4.079333       +Inf
---------------------------------------------------------------------------
(*) median unbiased estimates (MUE)

/* rerun to obtain score estimates */

exlogistic, coef test(score)

Exact logistic regression                        Number of obs =        32
Binomial variable: n                             Model score   =  18.75176
                                                 Pr >= score   =    0.0000
---------------------------------------------------------------------------
       admit |      Coef.       Score    Pr>=Score     [95% Conf. Interval]
-------------+-------------------------------------------------------------
      female |   2.336592*   6.685974      0.0151      .2044942       +Inf
      apcalc |   3.435807*   14.78361      0.0001      1.405934       +Inf
---------------------------------------------------------------------------
(*) median unbiased estimates (MUE)

In the output above, we first see the estimates of the exact logistic coefficients, 2.34 for female and 3.44 for apcalc. Both of these are statistically significant. Next come the estimates of the exact odds ratios which, in turn, is followed by the score statistic. The score statistic provides an alternate method for testing each of out variables and again, both are statistically significant. All of the estimates in this output represent median unbiased estimates.

Sample Write-Up of the Analysis

There does not seem to be a standard format for writing up or displaying the results of an exact logistic analysis. Below you will find one possible way to present the results, including a table and write-up of the results.

           Coefficient/
Variable     p-value         Odds Ratio
Gender       2.34              10.35
             0.0302
APCalc       3.44              31.06
             0.0003

The exact median unbiased estimates of the coefficients for both gender (2.34, p = 0.0302) and AP calculus (3.44, p = 0.0003) were statistically significant. The odds of a female being admitted were 10.35 times greater than for a male and the odds for an applicant who had taken AP calculus with 31.06 times greater than for one the had not taken the course.

Cautions, Flies in the Ointment

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.