UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Data Analysis Examples
Logit Regression

Examples

Example 1:  Suppose that we are interested in the factors that influence whether or not a political candidate wins an election.  The outcome (response) variable is binary (0/1);  win or lose.  The predictor variables of interest are: the amount of money spent on the campaign, the amount of time spent campaigning negatively and whether or not the candidate is an incumbent.  Because the response variable is binary we need to use a model that handles 0/1 variables correctly. 

Example 2:  We wish to study the influence of age, gender and exercise on whether or not someone has a heart attack.  Again, we have a binary response variable, whether or not a heart attack occurs. 

Example 3:  How do variables, such as, GRE (Graduate Record Exam scores), GPA (grade point average), and prestige of the undergraduate program effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.

Description of the Data

For our data analysis below, we are going to expand on Example 3 about getting into graduate school.  We have generated hypothetical data, which can be obtained by clicking on logit.sas7bdat. You can store this anywhere you like, but our examples will assume it has been stored in c:\data.

This hypothetical data set has a binary response (outcome, dependent) variable called admit. There are three predictor variables:  gre, gpa and topnotch, which is a binary predictor in which 1 indicates that the undergraduate institution was "top notch" and 0 indicates that it is not. 

proc means data="c:\data\logit";
  var gre gpa;
run;
The MEANS Procedure

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
GRE         400     587.7000000     115.5165364     220.0000000     800.0000000
GPA         400       3.3899000       0.3805668       2.2600000       4.0000000
-------------------------------------------------------------------------------
proc freq data="c:\data\logit";
  tables topnotch;
run;
The FREQ Procedure

                                     Cumulative    Cumulative
TOPNOTCH    Frequency     Percent     Frequency      Percent
-------------------------------------------------------------
       0         335       83.75           335        83.75
       1          65       16.25           400       100.00

Some Strategies You Might Try

Using the Logit Model

Before running logit, check to see if any cells (created by the crosstab of our categorical and response variables) are empty or particularly small.  If this occurs, there may be difficulty running the logit model. 

proc freq data="c:\data\logit";
  tables admit*topnotch / norow nocol nopercent;
run;
The FREQ Procedure

Table of ADMIT by TOPNOTCH

ADMIT     TOPNOTCH

Frequency|       0|       1|  Total
---------+--------+--------+
       0 |    238 |     35 |    273
---------+--------+--------+
       1 |     97 |     30 |    127
---------+--------+--------+
Total         335

None of the cells are too small or empty (has no cases), so we will run our logit model.

proc logistic data="c:\data\logit" descending;
  model admit = gre topnotch gpa;
run;
The LOGISTIC Procedure

                         Model Information

Data Set                      c:\data\logit        Written by SAS
Response Variable             ADMIT
Number of Response Levels     2
Model                         binary logit
Optimization Technique        Fisher's scoring


Number of Observations Read         400
Number of Observations Used         400
          Response Profile

 Ordered                      Total
   Value        ADMIT     Frequency

       1            1           127
       2            0           273

Probability modeled is ADMIT=1.

This output tells us the file being analyzed and the number of observations used. We see that all 400 observations in our data set were used in the analysis (fewer observations would have been used if any of our variables had missing values). We also see that SAS is modeling admit as a binary logit and is modeling admit being 1 (If we omitted the descending option, SAS would model admit being 0 and our results would be completely reversed).

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.


         Model Fit Statistics

                             Intercept
              Intercept            and
Criterion          Only     Covariates

AIC             501.977        486.130
SC              505.968        502.095
-2 Log L        499.977        478.130


        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        21.8469        3         <.0001
Score                   21.5235        3         <.0001
Wald                    20.4017        3         0.0001

This output describes the overall fit of the model, and tests the overall fit of the model. This is usually boring. The -2 Log L (499.977) can be used in comparisons of nested models, but we won't show an example of that here.  The likelihood ratio chi-square of 21.8469 with a p-value of 0.0001 tells us that our model as a whole fits significantly better than an empty model. 

The LOGISTIC Procedure

             Analysis of Maximum Likelihood Estimates

                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1     -4.6008      1.0964       17.6095        <.0001
GRE           1     0.00248     0.00107        5.3560        0.0207
TOPNOTCH      1      0.4372      0.2919        2.2443        0.1341
GPA           1      0.6676      0.3253        4.2123        0.0401

The above table gets into the heart of the results. It shows the coefficients (estimate), their standard errors, the Wald Chi-Square statistic, and associated p-values.  Both gre and gpa are statistically significant while topnotch is not. The interpretation of the coefficients can be awkward. For example, for a one unit increase in gpa, the log odds of being admitted to graduate school (vs. not being admitted) increases by .667. For this reason, many researchers prefer to exponentiate the coefficients and interpret them as odds-ratios, as shown in the next segment of output.

            Odds Ratio Estimates

               Point          95% Wald
Effect      Estimate      Confidence Limits

GRE            1.002       1.000       1.005
TOPNOTCH       1.548       0.874       2.744
GPA            1.949       1.031       3.688


Association of Predicted Probabilities and Observed Responses

Percent Concordant     63.9    Somers' D    0.283
Percent Discordant     35.6    Gamma        0.285
Percent Tied            0.5    Tau-a        0.123
Pairs                 34671    c            0.642

Now we can say that for a one unit increase in gpa, the odds of being admitted to graduate school (vs. not being admitted) increased by a factor of 1.94. Since GRE scores do not increase by a single unit (they increase only in units of 10), a one unit increase is meaningless. We can take the odds ratio and raise it to the 10th power, e.g. 1.002 ^ 10 = 1.02, and say for a 10 unit increase in GRE score, the odds of admission to graduate school increased by a factor of 1.02. To get a more accurate estimate, we can re-run the PROC LOGISTIC using the units statement, as shown below.

proc logistic data="c:\data\logit" descending;
  model admit = gre topnotch gpa;
  units gre = 10;
run;

The following table will be included in the output, giving the more precise estimate that for a 10 unit change in GRE, the odds ratio is 1.025.

       Adjusted Odds Ratios

Effect           Unit     Estimate

GRE           10.0000        1.025

Sample Write-up of the Analysis

Below is one way of describing these results.

A logit regression was used to predict admission to graduate school from GRE score, GPA, and whether the student was from a top notch university. GRE score and GPA were significant predictors of admission to graduate school, but being from a top notch university was not related to admission to graduate school. For every one unit increase in GPA, the odds of admission (vs. non-admission) increased by a factor of 1.95, while for every ten unit increase in GRE score, such odds increased by a factor of 1.025.

Cautions, Flies in the Ointment

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California