UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

SAS Data Analysis Examples
Ordinal Logistic Regression

Examples

Example 1:  A marketing research firm wants to investigate what factors influence the size of soda (small, medium, large or extra large) that people order at a fast-food chain.  These factors may include what type of sandwich is ordered (burger or chicken), whether or not fries are also ordered, and age of the consumer.  While the outcome variable, size of soda, is obviously ordered, the difference between the various sizes is not consistent.  The differences are 10, 8, 12 ounces, respectively. 

Example 2:  A 5-point Likert scale is used to assess people's opinion about a local ballot measure.  The response options are "strongly disagree", "disagree", "neutral", "agree" and "strongly agree".  Predictor variables will include the measure's author, his/her political party, and how much the measure's proposals will cost.  The researchers have reason to believe that the psychological "distances" between these points are not equal.  For example, the "distance" between "strongly disagree" and "disagree" may be shorter than the distance between "disagree" and "neutral". 

Example 3:  A study looks at factors that influence the decision of whether to apply to graduate school.  College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school.  Hence, our outcome variable has three categories.  Data on parental educational status, whether the undergraduate institution is public or private, and current GPA is also collected. 

Description of the Data

For our data analysis below, we are going to expand on Example 3 about applying to graduate school.  We have generated hypothetical data, which can be downloaded here.

This hypothetical data set has a thee level variable called apply (coded 0, 1, 2), that we will use as our response (i.e., outcome, dependent) variable.  We also have three variables that we will use as predictors:  pared, which is a 0/1 variable indicating whether at least one parent has a graduate degree; public, which is a 0/1 variable where 1 indicates that the undergraduate institution is a public university and 0 indicates that it is a private university, and gpa, which is the student's grade point average. 

proc freq data = "D:\ologit";
tables apply;
tables pared;
tables public;
run;
The FREQ Procedure

                                  Cumulative    Cumulative
APPLY    Frequency     Percent     Frequency      Percent
----------------------------------------------------------
    0         220       55.00           220        55.00
    1         140       35.00           360        90.00
    2          40       10.00           400       100.00


                                  Cumulative    Cumulative
PARED    Frequency     Percent     Frequency      Percent
----------------------------------------------------------
    0         337       84.25           337        84.25
    1          63       15.75           400       100.00


                                   Cumulative    Cumulative
PUBLIC    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------
     0         343       85.75           343        85.75
     1          57       14.25           400       100.00
proc means data = "D:\ologit";
var gpa;
run;
The MEANS Procedure

                      Analysis Variable : GPA

  N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------
400       2.9989250       0.3979409       1.9000000       4.0000000
-------------------------------------------------------------------

Some Strategies You Might Try

Using the Ordinal Logistic Model

Before we run our ordinal logistic model, we will see if any cells (created by the crosstab of our categorical and response variables) are empty or extremely small.  If any are, we may have difficulty running our model.  We have used some options on the tables statements to clean up the output.  Perhaps the most important option is the missprint option; this will have SAS include missing values as a category in the table.  Because we have no missing values in this data set, this option is not really needed; we have included it here only to show its use.

proc freq data = "D:\ologit";
tables apply*pared / nopercent norow nocol missprint;
tables apply*public / nopercent norow nocol missprint;
run;
The FREQ Procedure

Table of APPLY by PARED

APPLY     PARED

Frequency|       0|       1|  Total
---------+--------+--------+
       0 |    200 |     20 |    220
---------+--------+--------+
       1 |    110 |     30 |    140
---------+--------+--------+
       2 |     27 |     13 |     40
---------+--------+--------+
Total         337       63      400


Table of APPLY by PUBLIC

APPLY     PUBLIC

Frequency|       0|       1|  Total
---------+--------+--------+
       0 |    189 |     31 |    220
---------+--------+--------+
       1 |    124 |     16 |    140
---------+--------+--------+
       2 |     30 |     10 |     40
---------+--------+--------+
Total         343       57      400

None of the cells is too small or empty (has no cases), so we will run our model.

proc logistic data = "D:\ologit" desc;
model apply = pared public gpa;
run;
The LOGISTIC Procedure

                         Model Information

Data Set                      D:\ologit            Written by SAS
Response Variable             APPLY
Number of Response Levels     3
Model                         cumulative logit
Optimization Technique        Fisher's scoring

Number of Observations Read         400
Number of Observations Used         400

          Response Profile

 Ordered                      Total
   Value        APPLY     Frequency

       1            2            40
       2            1           140
       3            0           220

Probabilities modeled are cumulated over the lower Ordered Values.

                    Model Convergence Status

         Convergence criterion (GCONV=1E-8) satisfied.

Score Test for the Proportional Odds Assumption

Chi-Square       DF     Pr > ChiSq

    4.8446        3         0.1835

         Model Fit Statistics

                             Intercept
              Intercept            and
Criterion          Only     Covariates

AIC             745.205        727.025
SC              753.188        746.982
-2 Log L        741.205        717.025

The LOGISTIC Procedure

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        24.1804        3         <.0001
Score                   23.4804        3         <.0001
Wald                    24.3337        3         <.0001

              Analysis of Maximum Likelihood Estimates

                                 Standard          Wald
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept 2     1     -4.2983      0.8092       28.2189        <.0001
Intercept 1     1     -2.2029      0.7844        7.8869        0.0050
PARED           1      1.0478      0.2684       15.2350        <.0001
PUBLIC          1     -0.0585      0.2886        0.0411        0.8393
GPA             1      0.6156      0.2626        5.4963        0.0191

           Odds Ratio Estimates

             Point          95% Wald
Effect    Estimate      Confidence Limits

PARED        2.851       1.685       4.826
PUBLIC       0.943       0.536       1.661
GPA          1.851       1.106       3.096

Association of Predicted Probabilities and Observed Responses

Percent Concordant     60.0    Somers' D    0.210
Percent Discordant     39.0    Gamma        0.213
Percent Tied            1.1    Tau-a        0.119
Pairs                 45200    c            0.605

In the output above, we see that all 400 observations in our data set were used in the analysis.  Fewer observations would have been used if any of our variables had missing values.  By default, SAS does a listwise deletion of cases with missing values.  The Response Profile shows the value that SAS used when conducting the analysis (given in the Ordered Value column), the value of the original variable, and the number of cases in each level of the outcome variable.  (If you want SAS to use the values that you have assigned the outcome variable, then you would want to use the order = data option on the proc logistic statement.)  The note below this table reminds us that the "Probabilities modeled are cumulated over the lower Ordered Values."  It is helpful to remember this when interpreting the output.  Next we see that the model converged (you should not try to interpret any output if the model has not converged), and we also see that the test of the proportional odds assumption is non-significant.  One of the assumptions underlying ordinal logistic (and ordinal probit) regression is that the relationship between each pair of outcome groups is the same.  In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc.  This is called the proportional odds assumption or the parallel regression assumption.  Because the relationship between all pairs of groups is the same, there is only one set of coefficients (only one model).  If this was not the case, we would need different models (such as a generalized ordered logit model) to describe the relationship between each pair of outcome groups.  The table showing the Model Fit Statistics provides the AIC, SC and -2 log likelihood.  These can be used in the comparison of nested models.  In the next table we see various tests of the overall model; they all indicated that the model is statistically significant.

In the table Analysis of Maximum Likelihood Estimates, we see the degrees of freedom, coefficients, their standard errors, the Wald chi-square test and associated p-values.  Both pared and gpa are statistically significant; public is not.  So for pared, we would say that for a one unit increase in pared (i.e., going from 0 to 1), we expect a 1.05 increase in the log odds of being in a higher level of apply, given all of the other variables in the model are held constant.  For gpa, we would say that for a one unit increase in gpa, we would expect a 0.62 increase in the log odds of being in a higher level of apply, given that all of the other variables in the model are held constant.  In the next table we see the results presented as proportional odds ratios (the coefficient exponentiated) and the 95% confidence intervals for the proportional odds ratios.  We would interpret the proportional odds ratios pretty much as we would odds ratios from a binary logistic regression.  For pared, we would say that for a one unit increase in pared, i.e., going from 0 to 1, the odds of high apply versus the combined middle and low categories are 2.85 greater, given that all of the other variables in the model are held constant.  Likewise, the odds of the combined middle and high categories versus low apply is 2.85 times greater, given that all of the other variables in the model are held constant.  For a one unit increase in gpa, the odds of the low and middle categories of apply versus the high category of apply are 1.85 times greater, given that the other variables in the model are held constant.  Because of the proportional odds assumption (see below for more explanation), the same increase, 1.85 times, is found between low apply and the combined categories of middle and high apply.

Sample Write-up of the Analysis

Below is one way of describing the results.

Parental education and grade point average are positively associated with the tendency to apply for graduate school.  For a one unit increase in pared, the expected ordered log odds increases by 1.05 as you move to the next higher category of apply.  For every unit increase in gpa, we expect a 0.62 increase in the expected log odds as you move to the next higher category of apply.  There was no statistically significant effect of public on apply.

Cautions, Flies in the Ointment

See Also

Logistic Regression in SAS with movies
SAS Annotated Output:  Proc Logistic - Ordinal Logistic Regression
Logistic Regression Examples Using the SAS System by SAS Institute
Logistic Regression Using the SAS System: Theory and Application by Paul D. Allison
Categorical Data Analysis Using the SAS System, Second Edition, by Maura Stokes, Charles Davis and Gary Koch
An Introduction to Categorical Data Analysis by Alan Agresti
Categorical Data Analysis, Second Edition by Alan Agresti
Interpreting Probability Models:  Logit, Probit, and Other Generalized Linear Models by Tim Futing Liao
Statistical Methods for Categorical Data Analysis by Daniel Powers and Yu Xie
 

How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California