UCLA Academic Technology Services HomeServicesClassesContactJobs

SAS Data Analysis Examples
Exact Logistic Regression

Example:

Suppose that we are interested in the factors that influence whether or not a high school senior is admitted into a very competitive engineering school.  The outcome (response) variable is binary (0/1); admit or not admit.  The predictor variables of interest are: student gender, and whether or not the student took AP calculus in high school.  Because the response variable is binary we need to use a model that handles 0/1 variables correctly. And, because of the number of students involved is small, we will need a procedure that can perform the estimation with a small sample size. 

Description of the Data

The data for this exact logistic data analysis includes the number admitted and the total number of applicants broken down by gender (female) and whether or not they had taken AP calculus (apcalc).  Because the dataset is so small, we will read it in directly. We will use admit and n to compute the number who were not admitted, noadmit.
options nocenter;

data exlogit;
input female apcalc admit  n;
noadmit = n - admit;
datalines;
0        0        0        12
0        1        4         8
1        0        1         5
1        1        7         7
;
run;

Let's look at some frequency tables.

proc freq data=exlogit;
  weight n;
  tables female apcalc;
run;

The FREQ Procedure
                                   Cumulative    Cumulative
female    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------
     0          20       62.50            20        62.50
     1          12       37.50            32       100.00

                                   Cumulative    Cumulative
apcalc    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------
     0          17       53.13            17        53.13
     1          15       46.88            32       100.00

proc freq data=exlogit;
  weight noadmit;
  tables female*apcalc;
run;

Table of female by apcalc

Frequency|
Percent  |
Row Pct  |
Col Pct  |       0|       1|  Total
----------------------------------------
       0 |     12 |      4 |     16
         |  60.00 |  20.00 |  80.00
         |  75.00 |  25.00 |
         |  75.00 | 100.00 |
----------------------------------------
       1 |      4 |      0 |      4
         |  20.00 |   0.00 |  20.00
         | 100.00 |   0.00 |
         |  25.00 |   0.00 |
----------------------------------------
Total          16        4       20
            80.00    20.00   100.00

proc means data=exlogit sum;
  class female;
  var admit noadmit;
run;

The MEANS Procedure
                  N
      female    Obs    Variable             Sum
--------------------------------------------------
           0      2    admit          4.0000000
                       noadmit       16.0000000

           1      2    admit          8.0000000
                       noadmit        4.0000000
-------------------------------------------------- 

proc means data=exlogit sum;
  class apcalc;
  var admit noadmit;
run;

The MEANS Procedure
                  N
      apcalc    Obs    Variable             Sum
--------------------------------------------------
           0      2    admit          1.0000000
                       noadmit       16.0000000

           1      2    admit         11.0000000
                       noadmit        4.0000000
-------------------------------------------------- 

proc means data=exlogit sum;
  class female apcalc;
  var admit noadmit;
run;

The MEANS Procedure

                                  N
      female          apcalc    Obs    Variable             Sum
------------------------------------------------------------------
           0               0      1    admit                  0
                                       noadmit       12.0000000

                           1      1    admit          4.0000000
                                       noadmit        4.0000000

           1               0      1    admit          1.0000000
                                       noadmit        4.0000000

                           1      1    admit          7.0000000
                                       noadmit                0
------------------------------------------------------------------

The tables reveal that 32 people applied for the Engineering program, of which, 12 were admitted and 20 were denied admission. There were 20 male and 12 female applicants. Fifteen of the applicants had taken AP calculus and 17 had not.  What is really interesting is that all of the females with AP calculus were admitted versus only half the males. Also, only males with AP calculus were admitted while one female without the course was admitted.

Some Strategies You Might Try

Using the Exact Logistic Model

For the fun of it, let's look at the coefficients, standard errors and odds ratios from a regular logistic regression using proc logistic

proc logistic data=exlogit descending; 
  model admit/n = female apcalc; 
run;

The LOGISTIC Procedure
WARNING: The validity of the model fit is questionable.

Analysis of Maximum Likelihood Estimates

                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1    -13.9944       172.9        0.0066        0.9355
female        1     12.6081       172.9        0.0053        0.9419
apcalc        1     13.9944       172.9        0.0066        0.9355

Odds Ratio Estimates

             Point          95% Wald
Effect    Estimate      Confidence Limits

female    >999.999      <0.001    >999.999
apcalc    >999.999      <0.001    >999.999

Note the very large standard errors for the coefficient estimates (172.9) and the large point estimates for the odds ratios (999.999).

Now, let's run the exact logistic analysis using proc logistic with the exact statement.

proc logistic data=exlogit descending exactonly; 
  model admit/n = female apcalc; 
  exact female apcalc / estimate=both; 
run;

The LOGISTIC Procedure

Exact Conditional Analysis

Conditional Exact Tests
                                   --- p-Value ---
Effect   Test          Statistic    Exact      Mid

female   Score            6.6860   0.0151   0.0075
         Probability      0.0151   0.0151   0.0075
apcalc   Score           14.7836   0.0001   <.0001
         Probability    0.000146   0.0001   <.0001

Exact Parameter Estimates
                             95% Confidence
Parameter    Estimate            Limits           p-Value

female         2.3366*      0.2045    Infinity     0.0302
apcalc         3.4358*      1.4059    Infinity     0.0003

NOTE: * indicates a median unbiased estimate.

Exact Odds Ratios
                           95% Confidence
Parameter   Estimate           Limits          p-Value

female        10.346*      1.227   Infinity     0.0302
apcalc        31.056*      4.079   Infinity     0.0003

NOTE: * indicates a median unbiased estimate.

In the output above, we first see the conditional exact tests using the score statistic with both female and apcalc statistically significant.  This section is followed by the exact parameter estimates, i.e., the exact logistic regression coefficients, which, in turn, are followed by the median unbiased estimates of the odds ratios. 

Sample Write-Up of the Analysis

There does not seem to be a standard format for writing up or displaying the results of an exact logistic analysis.  Below you will find one possible way to present the results, including a table and write-up of the results.

           Coefficient/
Variable     p-value         Odds Ratio
Gender       2.34              10.35
             0.0302
APCalc       3.44              31.06
             0.0003

The exact median unbiased estimates of the coefficients for both gender (2.34, p = 0.0302) and AP calculus (3.44, p = 0.0003) were statistically significant. The odds of a female being admitted were 10.35 times greater than for a male and the odds for an applicant who had taken AP calculus with 31.06 times greater than for one the had not taken the course.

Cautions, Flies in the Ointment

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.