|
|
|
||||
|
|
|||||
This FAQ is an elaboration of a FAQ by Allen McDowell of Stata Corporation. Go to www.stata.com/support/faqs/stat/logit.html for the original.
Proportion data has values that fall between zero and one. Naturally, it would be nice to have the predicted values also fall between zero and one. One way to accomplish this is to use a generalized linear model (glm) with a logit link and the binomial family. We will include the robust option in the glm model to obtain robust standard errors which will be particularly useful if we have mispecified the distribution family.
We will demonstrate this using a dataset in which the dependent variable, meals, is the proportion of students receiving free or reduced priced meals at school.
Next, we will compute predicted scores from the model and transform them back so that they are scaled the same way as the original proportions.use http://www.ats.ucla.edu/stat/stata/faq/proportionglm meals yr_rnd parented api99, link(logit) family(binomial) robust nolog note: meals has non-integer values Generalized linear models No. of obs = 4257 Optimization : ML Residual df = 4253 Scale parameter = 1 Deviance = 395.8141242 (1/df) Deviance = .093067 Pearson = 374.7025759 (1/df) Pearson = .0881031 Variance function: V(u) = u*(1-u/1) [Binomial] Link function : g(u) = ln(u/(1-u)) [Logit] AIC = .7220973 Log pseudolikelihood = -1532.984106 BIC = -35143.61 ------------------------------------------------------------------------------ | Robust meals | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- yr_rnd | .0482527 .0321714 1.50 0.134 -.0148021 .1113074 parented | -.7662598 .0390715 -19.61 0.000 -.8428386 -.6896811 api99 | -.0073046 .0002156 -33.89 0.000 -.0077271 -.0068821 _cons | 6.75343 .0896767 75.31 0.000 6.577667 6.929193 ------------------------------------------------------------------------------
predict premeals1
(option mu assumed; predicted mean meals)
(164 missing values generated)
summarize meals premeals1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
meals | 4421 .5188102 .3107313 0 1
premeals1 | 4263 .5167403 .2848372 .0220988 .9770855
As a contrast, let's run the same analysis without the transformation. We will then graph the original dependent variable and the two predicted variables against api99.
regress meals yr_rnd parented api99
Source | SS df MS Number of obs = 4257
-------------+------------------------------ F( 3, 4253) = 6752.22
Model | 338.097096 3 112.699032 Prob > F = 0.0000
Residual | 70.985399 4253 .016690665 R-squared = 0.8265
-------------+------------------------------ Adj R-squared = 0.8264
Total | 409.082495 4256 .096119007 Root MSE = .12919
------------------------------------------------------------------------------
meals | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
yr_rnd | .0024454 .0054678 0.45 0.655 -.0082742 .013165
parented | -.1298907 .0048289 -26.90 0.000 -.1393579 -.1204234
api99 | -.0014118 .0000269 -52.40 0.000 -.0014646 -.0013589
_cons | 1.766162 .0134423 131.39 0.000 1.739808 1.792516
------------------------------------------------------------------------------
predict preols
/* figure 1: proportion dependent variable */
graph twoway scatter meals api99, yline(0 1) msym(oh)
/* figure 2: predicted values from model with logit transformation */
graph twoway scatter premeals1 api99, yline(0 1) msym(oh)
/* figure 3: predicted values from model without transformation */
graph twoway scatter preols api99, yline(0 1) msym(oh)

Note that the values from figures 1 and 2 fall within the range of zero to one while those
in figure 3 the values go beyond those bounds.
Let's finish by looking a the correlations of the predicted values with the dependent
variable, meals.| meals premea~1 preols -------------+--------------------------- meals | 1.0000 premeals1 | 0.9152 1.0000 preols | 0.9091 0.9891 1.0000 Note that the correlation between meals and premeals1 is slightly higher than for meals and preols.corr meals premeals1 preols (obs=4257)
Now, let's say that you want predicted proportions for some specific combinations of your predictor variables. Specifically, for 500, 600 and 700 for api99, for 1 and 2 for yr_rnd, and for parentrd of 2.5. You would append the following six observations to your dataset with an n of 4421.
api99 yr_rnd parented 500 1 2.5 600 1 2.5 700 1 2.5 500 2 2.5 600 2 2.5 700 2 2.5
Set all other variables to missing, rerun your model for the 'real' observations (note the in 1/4421), predict for all observations, and display your results.
Generalized linear models No. of obs = 4257 Optimization : ML Residual df = 4253 Scale parameter = .0155986 Deviance = 66.34069081 (1/df) Deviance = .0155986 Pearson = 66.34069081 (1/df) Pearson = .0155986 Variance function: V(u) = 1 [Gaussian] Link function : g(u) = ln(u/(1-u)) [Logit] AIC = -1.32176 Log pseudolikelihood = 2817.366575 BIC = -35473.09 ------------------------------------------------------------------------------ | Robust meals | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- yr_rnd | .01629 .0331978 0.49 0.624 -.0487764 .0813565 parented | -.7447189 .0422017 -17.65 0.000 -.8274327 -.6620051 api99 | -.0071906 .0002262 -31.79 0.000 -.0076339 -.0067472 _cons | 6.658837 .0935348 71.19 0.000 6.475512 6.842162 ------------------------------------------------------------------------------ predict premeals (option mu assumed; predicted mean meals) (164 missing values generated) list api99 yr_rnd parented premeals2 in -6/l, separator(3) +--------------------------------------+ | api99 yr_rnd parented premeals | |--------------------------------------| 4422. | 500 No 2.5 .774471 | 4423. | 600 No 2.5 .6232278 | 4424. | 700 No 2.5 .4434458 | |--------------------------------------| 4425. | 500 Yes 2.5 .7827873 | 4426. | 600 Yes 2.5 .6344891 | 4427. | 700 Yes 2.5 .4553849 | +--------------------------------------+glm meals yr_rnd parented api99 in 1/4421, link(logit) family(binomial) robust nolog
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services