|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
Example 2: A 5-point Likert scale is used to assess people's opinion about a local ballot measure. The response options are "strongly disagree", "disagree", "neutral", "agree" and "strongly agree". Predictor variables will include the measure's author, his/her political party, and how much the measure's proposals will cost. The researchers have reason to believe that the psychological "distances" between these points are not equal. For example, the "distance" between "strongly disagree" and "disagree" may be shorter than the distance between "disagree" and "neutral".
Example 3: A study looks at factors that influence the decision of whether to apply to graduate school. College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school. Hence, our outcome variable has three categories. Data on parental educational status, whether the undergraduate institution is public or private, and current GPA is also collected.
This hypothetical data set has a thee level variable called apply (coded 0, 1, 2), that we will use as our response (i.e., outcome, dependent) variable. We also have three variables that we will use as predictors: pared, which is a 0/1 variable indicating whether at least one parent has a graduate degree; public, which is a 0/1 variable where 1 indicates that the undergraduate institution is a public university and 0 indicates that it is a private university, and gpa, which is the student's grade point average.
proc freq data = "D:\ologit"; tables apply; tables pared; tables public; run;The FREQ Procedure Cumulative Cumulative APPLY Frequency Percent Frequency Percent ---------------------------------------------------------- 0 220 55.00 220 55.00 1 140 35.00 360 90.00 2 40 10.00 400 100.00 Cumulative Cumulative PARED Frequency Percent Frequency Percent ---------------------------------------------------------- 0 337 84.25 337 84.25 1 63 15.75 400 100.00 Cumulative Cumulative PUBLIC Frequency Percent Frequency Percent ----------------------------------------------------------- 0 343 85.75 343 85.75 1 57 14.25 400 100.00proc means data = "D:\ologit"; var gpa; run;The MEANS Procedure Analysis Variable : GPA N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- 400 2.9989250 0.3979409 1.9000000 4.0000000 -------------------------------------------------------------------
Before we run our ordinal logistic model, we will see if any cells (created by the crosstab of our categorical and response variables) are empty or extremely small. If any are, we may have difficulty running our model. We have used some options on the tables statements to clean up the output. Perhaps the most important option is the missprint option; this will have SAS include missing values as a category in the table. Because we have no missing values in this data set, this option is not really needed; we have included it here only to show its use.
proc freq data = "D:\ologit"; tables apply*pared / nopercent norow nocol missprint; tables apply*public / nopercent norow nocol missprint; run;The FREQ Procedure Table of APPLY by PARED APPLY PARED Frequency| 0| 1| Total ---------+--------+--------+ 0 | 200 | 20 | 220 ---------+--------+--------+ 1 | 110 | 30 | 140 ---------+--------+--------+ 2 | 27 | 13 | 40 ---------+--------+--------+ Total 337 63 400 Table of APPLY by PUBLIC APPLY PUBLIC Frequency| 0| 1| Total ---------+--------+--------+ 0 | 189 | 31 | 220 ---------+--------+--------+ 1 | 124 | 16 | 140 ---------+--------+--------+ 2 | 30 | 10 | 40 ---------+--------+--------+ Total 343 57 400
None of the cells is too small or empty (has no cases), so we will run our model.
proc logistic data = "D:\ologit" desc; model apply = pared public gpa; run;The LOGISTIC Procedure Model Information Data Set D:\ologit Written by SAS Response Variable APPLY Number of Response Levels 3 Model cumulative logit Optimization Technique Fisher's scoring Number of Observations Read 400 Number of Observations Used 400 Response Profile Ordered Total Value APPLY Frequency 1 2 40 2 1 140 3 0 220 Probabilities modeled are cumulated over the lower Ordered Values. Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 4.8446 3 0.1835 Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 745.205 727.025 SC 753.188 746.982 -2 Log L 741.205 717.025 The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 24.1804 3 <.0001 Score 23.4804 3 <.0001 Wald 24.3337 3 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 2 1 -4.2983 0.8092 28.2189 <.0001 Intercept 1 1 -2.2029 0.7844 7.8869 0.0050 PARED 1 1.0478 0.2684 15.2350 <.0001 PUBLIC 1 -0.0585 0.2886 0.0411 0.8393 GPA 1 0.6156 0.2626 5.4963 0.0191 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits PARED 2.851 1.685 4.826 PUBLIC 0.943 0.536 1.661 GPA 1.851 1.106 3.096 Association of Predicted Probabilities and Observed Responses Percent Concordant 60.0 Somers' D 0.210 Percent Discordant 39.0 Gamma 0.213 Percent Tied 1.1 Tau-a 0.119 Pairs 45200 c 0.605
In the output above, we see that all 400 observations in our data set were used in the analysis. Fewer observations would have been used if any of our variables had missing values. By default, SAS does a listwise deletion of cases with missing values. The Response Profile shows the value that SAS used when conducting the analysis (given in the Ordered Value column), the value of the original variable, and the number of cases in each level of the outcome variable. (If you want SAS to use the values that you have assigned the outcome variable, then you would want to use the order = data option on the proc logistic statement.) The note below this table reminds us that the "Probabilities modeled are cumulated over the lower Ordered Values." It is helpful to remember this when interpreting the output. Next we see that the model converged (you should not try to interpret any output if the model has not converged), and we also see that the test of the proportional odds assumption is non-significant. One of the assumptions underlying ordinal logistic (and ordinal probit) regression is that the relationship between each pair of outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc. This is called the proportional odds assumption or the parallel regression assumption. Because the relationship between all pairs of groups is the same, there is only one set of coefficients (only one model). If this was not the case, we would need different models (such as a generalized ordered logit model) to describe the relationship between each pair of outcome groups. The table showing the Model Fit Statistics provides the AIC, SC and -2 log likelihood. These can be used in the comparison of nested models. In the next table we see various tests of the overall model; they all indicated that the model is statistically significant.
In the table Analysis of Maximum Likelihood Estimates, we see the degrees of freedom, coefficients, their standard errors, the Wald chi-square test and associated p-values. Both pared and gpa are statistically significant; public is not. So for pared, we would say that for a one unit increase in pared (i.e., going from 0 to 1), we expect a 1.05 increase in the log odds of being in a higher level of apply, given all of the other variables in the model are held constant. For gpa, we would say that for a one unit increase in gpa, we would expect a 0.62 increase in the log odds of being in a higher level of apply, given that all of the other variables in the model are held constant. In the next table we see the results presented as proportional odds ratios (the coefficient exponentiated) and the 95% confidence intervals for the proportional odds ratios. We would interpret the proportional odds ratios pretty much as we would odds ratios from a binary logistic regression. For pared, we would say that for a one unit increase in pared, i.e., going from 0 to 1, the odds of high apply versus the combined middle and low categories are 2.85 greater, given that all of the other variables in the model are held constant. Likewise, the odds of the combined middle and high categories versus low apply is 2.85 times greater, given that all of the other variables in the model are held constant. For a one unit increase in gpa, the odds of the low and middle categories of apply versus the high category of apply are 1.85 times greater, given that the other variables in the model are held constant. Because of the proportional odds assumption (see below for more explanation), the same increase, 1.85 times, is found between low apply and the combined categories of middle and high apply.
Below is one way of describing the results.
Parental education and grade point average are positively associated with the tendency to apply for graduate school. For a one unit increase in pared, the expected ordered log odds increases by 1.05 as you move to the next higher category of apply. For every unit increase in gpa, we expect a 0.62 increase in the expected log odds as you move to the next higher category of apply. There was no statistically significant effect of public on apply.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services