|
|
|
||||
|
|
|||||
Example 2: A 5-point Likert scale is used to assess people's opinion about a local ballot measure. The response options are "strongly disagree", "disagree", "neutral", "agree" and "strongly agree". Predictor variables will include the measure's author, his/her political party, and how much the measure's proposals will cost. The researchers have reason to believe that the psychological "distances" between these points are not equal. For example, the "distance" between "strongly disagree" and "disagree" may be shorter than the distance between "disagree" and "neutral".
Example 3: A study looks at factors that influence the decision of whether to apply to graduate school. College juniors are asked if they are unlikely, somewhat likely, or very likely to apply to graduate school. Hence, our outcome variable has three categories. Data on parental educational status, whether the undergraduate institution is public or private, and current GPA is also collected.
use http://www.ats.ucla.edu/stat/stata/dae/ologit.dta, clear
This hypothetical data set has a thee level variable called apply (coded 0, 1, 2), that we will use as our response (i.e., outcome, dependent) variable. We also have three variables that we will use as predictors: pared, which is a 0/1 variable indicating whether at least one parent has a graduate degree; public, which is a 0/1 variable where 1 indicates that the undergraduate institution is a public university and 0 indicates that it is a private university, and gpa, which is the student's grade point average.
tab apply apply | Freq. Percent Cum. ------------+----------------------------------- 0 | 220 55.00 55.00 1 | 140 35.00 90.00 2 | 40 10.00 100.00 ------------+----------------------------------- Total | 400 100.00tab pared pared | Freq. Percent Cum. ------------+----------------------------------- 0 | 337 84.25 84.25 1 | 63 15.75 100.00 ------------+----------------------------------- Total | 400 100.00 tab publicpublic | Freq. Percent Cum. ------------+----------------------------------- 0 | 343 85.75 85.75 1 | 57 14.25 100.00 ------------+----------------------------------- Total | 400 100.00 summarize gpaVariable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- gpa | 400 2.998925 .3979409 1.9 4
Before we run our ordinal logistic model, we will see if any cells (created by the crosstab of our categorical and response variables) are empty or extremely small. If any are, we may have difficulty running our model.
tab apply pared| pared apply | 0 1 | Total -----------+----------------------+---------- 0 | 200 20 | 220 1 | 110 30 | 140 2 | 27 13 | 40 -----------+----------------------+---------- Total | 337 63 | 400tab apply public | public apply | 0 1 | Total -----------+----------------------+---------- 0 | 189 31 | 220 1 | 124 16 | 140 2 | 30 10 | 40 -----------+----------------------+---------- Total | 343 57 | 400
None of the cells is too small or empty (has no cases), so we will run our model.
ologit apply pared public gpa
Iteration 0: log likelihood = -370.60264
Iteration 1: log likelihood = -358.605
Iteration 2: log likelihood = -358.51248
Iteration 3: log likelihood = -358.51244
Ordered logistic regression Number of obs = 400
LR chi2(3) = 24.18
Prob > chi2 = 0.0000
Log likelihood = -358.51244 Pseudo R2 = 0.0326
------------------------------------------------------------------------------
apply | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pared | 1.047664 .2657891 3.94 0.000 .5267266 1.568601
public | -.0586828 .2978588 -0.20 0.844 -.6424754 .5251098
gpa | .6157458 .2606311 2.36 0.018 .1049183 1.126573
-------------+----------------------------------------------------------------
/cut1 | 2.203323 .7795353 .6754622 3.731184
/cut2 | 4.298767 .8043146 2.72234 5.875195
------------------------------------------------------------------------------
In the output above, we first see the iteration log. In general, this is not so interesting but does contain information on how well the model converges. The final log likelihood (-358.51244) can be used in comparisons of nested models, but we won't show an example of that here. Also at the top of the output we see that all 400 observations in our data set were used in the analysis. Fewer observations would have been used if any of our variables had missing values. By default, Stata does a listwise deletion of cases with missing values. The likelihood ratio chi-square of 24.18 with a p-value of 0.0000 tells us that our model as a whole is statistically significant, as compared to model with no predictors. The pseudo-R-squared is also given. It is a pseudo-R-squared because there is no direct equivalent of an R-squared (from OLS regression) in non-linear models. There are many different pseudo-R-squares, but the emphasis should be on the pseudo.
In the table we see the coefficients, their standard errors, the z-test and associated p-values, and the 95% confidence interval of the coefficients. Both pared and gpa are statistically significant; public is not. The estimates in the output are given in units of ordered logits, or ordered log odds. So for pared, we would say that for a one unit increase in pared (i.e., going from 0 to 1), we expect a 1.05 increase in the log odds of apply, given all of the other variables in the model are held constant. For gpa, we would say that for a one unit increase in gpa, we would expect a 0.62 increase in the expected value of apply in the log odds scale, given that all of the other variables in the model are held constant. The cutpoints shown at the bottom of the output indicate where the latent variable is cut to make the three groups that we observe in our data. Note that this latent variable is continuous. In general, these are not used in the interpretation of the results. The cutpoints are closely related to thresholds, which are reported by other statistical packages. For further information, please see the Stata FAQ: How can I convert Stata's parameterization of ordered probit and logistic models to one in which a constant is estimated?
We can obtain odds ratios using the or option after the ologit command.
ologit apply pared public gpa, or
Iteration 0: log likelihood = -370.60264
Iteration 1: log likelihood = -358.605
Iteration 2: log likelihood = -358.51248
Iteration 3: log likelihood = -358.51244
Ordered logistic regression Number of obs = 400
LR chi2(3) = 24.18
Prob > chi2 = 0.0000
Log likelihood = -358.51244 Pseudo R2 = 0.0326
------------------------------------------------------------------------------
apply | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pared | 2.850982 .75776 3.94 0.000 1.69338 4.799927
public | .9430059 .2808826 -0.20 0.844 .5259888 1.690644
gpa | 1.851037 .4824377 2.36 0.018 1.11062 3.085067
-------------+----------------------------------------------------------------
/cut1 | 2.203323 .7795353 .6754622 3.731184
/cut2 | 4.298767 .8043146 2.72234 5.875195
------------------------------------------------------------------------------
In the output above the results are displayed as proportional odds ratios. We would interpret these pretty much as we would odds ratios from a binary logistic regression. For pared, we would say that for a one unit increase in pared, i.e., going from 0 to 1, the odds of high apply versus the combined middle and low categories are 2.85 greater, given that all of the other variables in the model are held constant. Likewise, the odds of the combined middle and high categories versus low apply is 2.85 times greater, given that all of the other variables in the model are held constant. For a one unit increase in gpa, the odds of the low and middle categories of apply versus the high category of apply are 1.85 times greater, given that the other variables in the model are held constant. Because of the proportional odds assumption (see below for more explanation), the same increase, 1.85 times, is found between low apply and the combined categories of middle and high apply.
You can also use the listcoeff command to obtain the odds ratios, as well as the change in the odds for a standard deviation of the variable. We have used the help option to get the list at the bottom of the output explaining each column. You can use the percent option to see the percent change in the odds. The listcoeff command was written by Long and Freese, and you will need to download it by typing findit spost (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
listcoef, help
ologit (N=400): Factor Change in Odds
Odds of: >m vs <=m
----------------------------------------------------------------------
apply | b z P>|z| e^b e^bStdX SDofX
-------------+--------------------------------------------------------
pared | 1.04766 3.942 0.000 2.8510 1.4654 0.3647
public | -0.05868 -0.197 0.844 0.9430 0.9797 0.3500
gpa | 0.61575 2.363 0.018 1.8510 1.2777 0.3979
----------------------------------------------------------------------
b = raw coefficient
z = z-score for test of b=0
P>|z| = p-value for z-test
e^b = exp(b) = factor change in odds for unit increase in X
e^bStdX = exp(b*SD of X) = change in odds for SD increase in X
SDofX = standard deviation of X
listcoef, help percent
ologit (N=400): Percentage Change in Odds
Odds of: >m vs <=m
----------------------------------------------------------------------
apply | b z P>|z| % %StdX SDofX
-------------+--------------------------------------------------------
pared | 1.04766 3.942 0.000 185.1 46.5 0.3647
public | -0.05868 -0.197 0.844 -5.7 -2.0 0.3500
gpa | 0.61575 2.363 0.018 85.1 27.8 0.3979
----------------------------------------------------------------------
b = raw coefficient
z = z-score for test of b=0
P>|z| = p-value for z-test
% = percent change in odds for unit increase in X
%StdX = percent change in odds for SD increase in X
SDofX = standard deviation of X
One of the assumptions underlying ordinal logistic (and ordinal probit) regression is that the relationship between each pair of outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc. This is called the proportional odds assumption or the parallel regression assumption. Because the relationship between all pairs of groups is the same, there is only one set of coefficients (only one model). If this was not the case, we would need different models to describe the relationship between each pair of outcome groups. We need to test the proportional odds assumption, and there are two tests that can be used to do so. First, we need to download a user-written command called omodel (type findit omodel). The first test that we will show does a likelihood ratio test. The null hypothesis is that there is no difference in the coefficients between models, so we "hope" to get a non-significant result. The brant command performs a Brant test. As the note at the bottom of the output indicates, we also "hope" that these tests are non-significant. We have used the detail option here, which shows the estimated coefficients for the two equations. (We have two equations because we have three categories in our response variable.) Also, you will note that the likelihood ratio chi-square value of 4.06 obtained from the ologit command is very close to the 4.34 obtained from the brant command.
omodel logit apply pared public gpa
Iteration 0: log likelihood = -370.60264
Iteration 1: log likelihood = -358.605
Iteration 2: log likelihood = -358.51248
Iteration 3: log likelihood = -358.51244
Ordered logit estimates Number of obs = 400
LR chi2(3) = 24.18
Prob > chi2 = 0.0000
Log likelihood = -358.51244 Pseudo R2 = 0.0326
------------------------------------------------------------------------------
apply | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
pared | 1.047664 .2657891 3.94 0.000 .5267266 1.568601
public | -.0586828 .2978588 -0.20 0.844 -.6424754 .5251098
gpa | .6157458 .2606311 2.36 0.018 .1049183 1.126573
-------------+----------------------------------------------------------------
_cut1 | 2.203323 .7795353 (Ancillary parameters)
_cut2 | 4.298767 .8043146
------------------------------------------------------------------------------
Approximate likelihood-ratio test of proportionality of odds
across response categories:
chi2(3) = 4.06
Prob > chi2 = 0.2553
brant, detail
Estimated coefficients from j-1 binary regressions
y>0 y>1
pared 1.0596117 .915596
public -.20055709 .53508208
gpa .54824568 .73632132
_cons -1.9829709 -4.7544684
Brant Test of Parallel Regression Assumption
Variable | chi2 p>chi2 df
-------------+--------------------------
All | 4.34 0.227 3
-------------+--------------------------
pared | 0.13 0.716 1
public | 3.44 0.064 1
gpa | 0.18 0.672 1
----------------------------------------
A significant test statistic provides evidence that the parallel
regression assumption has been violated.
Both of the above tests indicate that we have not violated the proportional odds assumption. If we had, we would want to run our model as a generalized ordered logistic model using gologit2. You need to download gologit2 by typing findit gologit2.
We can also obtain predicted probabilities, which are usually easier to understand than the coefficients or the odds ratios. The commands used below are user-written and need to be downloaded, which you can do by typing findit spost (which we did above). We will start with prtab. This can be used with either a categorical variable or a continuous variable and shows the predicted probability for each of the values of the variable specified. We will use pared as an example with a categorical predictor. As you can see, the predicted probability of being in the lowest category of apply is 0.59 if neither parent has a graduate level education and 0.34 otherwise. For the middle category of apply, the predicted probabilities are 0.33 and 0.46, and for the highest category of apply, 0.079 and 0.196. Hence, if neither of a respondent 's parents have a graduate level education, the predicted probability of applying to graduate school decreases. Beneath each output, we can see the values at which the variables are held; by default, they are held at their mean.
prtab pared
ologit: Predicted probabilities for apply
Predicted probability of outcome 0
----------------------
pared | Prediction
----------+-----------
0 | 0.5903
1 | 0.3357
----------------------
Predicted probability of outcome 1
----------------------
pared | Prediction
----------+-----------
0 | 0.3311
1 | 0.4685
----------------------
Predicted probability of outcome 2
----------------------
pared | Prediction
----------+-----------
0 | 0.0787
1 | 0.1958
----------------------
pared public gpa
x= .1575 .1425 2.998925
Now let's use the prvalue command, which allows us to select values of a continuous variable and see what the predicted probabilities are at that point. Below, we see the predicted probabilities for gpa at 2, 3 and 4. As you can see, for each value of gpa, the highest predicted probability is for the lowest category of apply, which makes sense because most respondents are in that category. You can also see that the predicted probability increases for both the middle and highest categories of apply as gpa increases.
prvalue , x(gpa=2)
ologit: Predictions for apply
Confidence intervals by delta method
95% Conf. Interval
Pr(y=0|x): 0.6932 [ 0.5754, 0.8110]
Pr(y=1|x): 0.2552 [ 0.1625, 0.3478]
Pr(y=2|x): 0.0516 [ 0.0206, 0.0827]
pared public gpa
x= .1575 .1425 2
prvalue , x(gpa=3)
ologit: Predictions for apply
Confidence intervals by delta method
95% Conf. Interval
Pr(y=0|x): 0.5497 [ 0.4997, 0.5997]
Pr(y=1|x): 0.3588 [ 0.3104, 0.4071]
Pr(y=2|x): 0.0915 [ 0.0635, 0.1196]
pared public gpa
x= .1575 .1425 3
prvalue , x(gpa=4)
ologit: Predictions for apply
Confidence intervals by delta method
95% Conf. Interval
Pr(y=0|x): 0.3974 [ 0.2670, 0.5278]
Pr(y=1|x): 0.4454 [ 0.3671, 0.5236]
Pr(y=2|x): 0.1572 [ 0.0792, 0.2352]
pared public gpa
x= .1575 .1425 4
Below, we use the prvalue command to set the values of all of our predictor variables. This is useful when you want to describe profiles of respondents.
prvalue, x(gpa=3.5 pared=1 public=1)
ologit: Predictions for apply
Confidence intervals by delta method
95% Conf. Interval
Pr(y=0|x): 0.2807 [ 0.1444, 0.4171]
Pr(y=1|x): 0.4796 [ 0.4156, 0.5437]
Pr(y=2|x): 0.2396 [ 0.1146, 0.3647]
pared public gpa
x= 1 1 3.5
We will use the estout command to create a table of the results that might be more appropriate for publication. This command is user-written, so type findit estout to download it.
estout, varwidth(12) varlabels(_cons Constant) cells(b(star fmt(%8.2f)) /// se(par fmt(%8.2f))) /// stats(ll chi2 r2_p, labels(log_likelihood LR_chi_square r2_pvalue) fmt(%8.2f))b/se apply pared 1.05*** (0.27) public -0.06 (0.30) gpa 0.62* (0.26) cut1 Constant 2.20** (0.78) cut2 Constant 4.30*** (0.80) log_likelihood -358.51 LR_chi_square 24.18 r2_pvalue 0.03
Below is one way of describing the results.
Parental education and grade point average are positively associated with the tendency to apply for graduate school. For a one unit increase in pared, the expected ordered log odds increases by 1.05 as you move to the next higher category of apply. For every unit increase in gpa, we expect a 0.62 increase in the expected log odds as you move to the next higher category of apply. There was no statistically significant effect of public on apply.
Describing the results in terms of ordered log-odds (or odds ratios) may not be the simplest metric for your audience to understand. As we saw above, you can use the prvalue, prtab and other spost commands to obtain predicted probabilities. These are often useful for helping to tell the "story" of your results.
Neither the ordinal logistic model nor the ordinal probit model are linear. To make the model linear, a transformation is done on the dependent variable. In logistic regression (including binary, ordinal and multinomial logistic models), the transformation is the logit function which is the natural log of the odds. In probit models (including binary, ordinal and multinomial probit models), the function used is the inverse of the standard normal cumulative distribution (a.k.a. a z-score). In reality, this difference isn't too important: both transformations are equally good at linearizing the model; which one you use is a matter of personal preference. Both methods use maximum likelihood, and so require more cases than a similar OLS model. Unlike logistic models, you don't get odds ratios with probit models, but you can get predicted probabilities from either type of model.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services