|
|
|
||||
|
|
|||||
Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.
Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.
We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.dta The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.
In addition to predicting the number of days absent there is interest in predicting the existence of excess zeros, i.e., the probability that a student will have zero absences. We will use both male and school to investigate this.
Let's look at the data.
use http://www.ats.ucla.edu/stat/stata/dae/poissonreg, clear
summarize daysabs math langarts male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
daysabs | 316 5.810127 7.449003 0 45
math | 316 48.75101 17.88076 1.007114 98.99289
langarts | 316 50.06379 17.93921 1.007114 98.99289
male | 316 .4873418 .5006325 0 1
school | 316 1.496835 .500783 1 2
tabstat daysabs, stat(n mean var)
variable | N mean variance
-------------+------------------------------
daysabs | 316 5.810127 55.48764
--------------------------------------------
histogram daysabs, discrete freq
tab1 male school
-> tabulation of male
male | Freq. Percent Cum.
------------+-----------------------------------
0 | 162 51.27 51.27
1 | 154 48.73 100.00
------------+-----------------------------------
Total | 316 100.00
-> tabulation of school
school | Freq. Percent Cum.
------------+-----------------------------------
1 | 159 50.32 50.32
2 | 157 49.68 100.00
------------+-----------------------------------
Total | 316 100.00
zip daysabs math langarts male, inflate(school male) vuong
Fitting constant-only model:
Iteration 0: log likelihood = -1494.2292
Iteration 1: log likelihood = -1384.2385
Iteration 2: log likelihood = -1380.4502
Iteration 3: log likelihood = -1380.4264
Iteration 4: log likelihood = -1380.4264
Fitting full model:
Iteration 0: log likelihood = -1380.4264
Iteration 1: log likelihood = -1346.2718
Iteration 2: log likelihood = -1345.9222
Iteration 3: log likelihood = -1345.9221
Zero-inflated Poisson regression Number of obs = 316
Nonzero obs = 254
Zero obs = 62
Inflation model = logit LR chi2(3) = 69.01
Log likelihood = -1345.922 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
daysabs |
math | -.0003056 .0018612 -0.16 0.870 -.0039535 .0033423
langarts | -.0094936 .0019086 -4.97 0.000 -.0132344 -.0057528
male | -.246477 .0487593 -5.05 0.000 -.3420436 -.1509105
_cons | 2.54522 .0732565 34.74 0.000 2.40164 2.6888
-------------+----------------------------------------------------------------
inflate |
school | 1.151285 .3143818 3.66 0.000 .5351077 1.767462
male | .8692552 .3046072 2.85 0.004 .272236 1.466274
_cons | -3.704412 .5943179 -6.23 0.000 -4.869254 -2.539571
------------------------------------------------------------------------------
Vuong test of zip vs. standard Poisson: z = 5.48 Pr>z = 0.0000
The output looks very much like the output from an OLS regression. The output begins
the iteration log giving the values of the log likelihoods starting
with a model that has no predictors. The last value in the log is the final value
of the log likelihood for the full model and is repeated below.Next comes the header information. On the right-hand side the number of observations used (316) is given along with the likelihood ratio chi-squared with three degrees of freedom for the full model, followed by the p-value for the chi-square. The model, as a whole, is statistically significant. The header also includes a pseudo-R2 which is 0.0536 in this example.
Below the header you will find the poisson regression coefficients for each of the variables along with standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. Following these, are probit coefficients for predicting excess zeros along with their standard errors, z-scores, p-values and confidence intervals.
Below the various coefficients you will find the results of the Vuong test. The Vuong test compares the zero-inflated model with an ordinary poisson regression model. A significant z-test indicates that the zero-inflated model is better.
Since math is clearly not significant, let's rerun the model without it.
zip daysabs langarts male, inflate(school male) vuong
Fitting constant-only model:
Iteration 0: log likelihood = -1494.2292
Iteration 1: log likelihood = -1384.2385
Iteration 2: log likelihood = -1380.4502
Iteration 3: log likelihood = -1380.4264
Iteration 4: log likelihood = -1380.4264
Fitting full model:
Iteration 0: log likelihood = -1380.4264
Iteration 1: log likelihood = -1346.2822
Iteration 2: log likelihood = -1345.9356
Iteration 3: log likelihood = -1345.9356
Zero-inflated Poisson regression Number of obs = 316
Nonzero obs = 254
Zero obs = 62
Inflation model = logit LR chi2(2) = 68.98
Log likelihood = -1345.936 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
daysabs |
langarts | -.009721 .0013138 -7.40 0.000 -.012296 -.0071461
male | -.2470039 .0486519 -5.08 0.000 -.3423598 -.151648
_cons | 2.542103 .0707735 35.92 0.000 2.403389 2.680816
-------------+----------------------------------------------------------------
inflate |
school | 1.151284 .3143835 3.66 0.000 .5351039 1.767465
male | .8693162 .304612 2.85 0.004 .2722876 1.466345
_cons | -3.704446 .5943313 -6.23 0.000 -4.869313 -2.539578
------------------------------------------------------------------------------
Vuong test of zip vs. standard Poisson: z = 5.66 Pr>z = 0.0000
Now, just to be on the safe side, let's rerun the zip command with the robust
option in order to obtain robust standard errors for the poisson regression coefficients. We
cannot include the vuong option when using robust standard errors.
zip daysabs langarts male, inflate(school male) robust
Fitting constant-only model:
Iteration 0: log pseudolikelihood = -1494.2292
Iteration 1: log pseudolikelihood = -1384.2385
Iteration 2: log pseudolikelihood = -1380.4502
Iteration 3: log pseudolikelihood = -1380.4264
Iteration 4: log pseudolikelihood = -1380.4264
Fitting full model:
Iteration 0: log pseudolikelihood = -1380.4264
Iteration 1: log pseudolikelihood = -1346.2822
Iteration 2: log pseudolikelihood = -1345.9356
Iteration 3: log pseudolikelihood = -1345.9356
Zero-inflated Poisson regression Number of obs = 316
Nonzero obs = 254
Zero obs = 62
Inflation model = logit Wald chi2(2) = 12.58
Log pseudolikelihood = -1345.936 Prob > chi2 = 0.0019
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
daysabs |
langarts | -.009721 .0032475 -2.99 0.003 -.016086 -.0033561
male | -.2470039 .1262323 -1.96 0.050 -.4944147 .0004069
_cons | 2.542103 .1749726 14.53 0.000 2.199163 2.885042
-------------+----------------------------------------------------------------
inflate |
school | 1.151284 .3096867 3.72 0.000 .5443094 1.758259
male | .8693162 .3009801 2.89 0.004 .2794061 1.459226
_cons | -3.704446 .5694312 -6.51 0.000 -4.82051 -2.588381
------------------------------------------------------------------------------
Using the robust option has resulted in a fairly large change in the model chi-square,
which is now a Wald chi-square, based on log pseudolikelihoods, instead of a likelihood ratio
chi-square.In the main body of the output contains the poisson and probit coefficients, robust standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. The robust standard errors attempt to adjust for heterogeneity in the model.
Finally, we will use the prchange command (findit prchange) by J. Scott Long and Jeremy Freese to get the predicted change in days absent.
prchange
zip: Changes in Rate for daysabs
min->max 0->1 -+1/2 -+sd/2
langarts -5.6598 -0.0900 -0.0556 -0.9987
male -2.2729 -2.2729 -2.2767 -1.1402
exp(xb): 5.7195
base x values for count equation:
langarts male
x= 50.0638 .487342
sd(x)= 17.9392 .500633
base x values for binary equation:
school male
x= 1.49684 .487342
sd(x)= .500783 .500633
estout, cells(b(star fmt(%8.2f)) se(par fmt(%8.2f))) stats(ll chi2, fmt(%8.2f))
.
b/se
daysabs
langarts -0.01**
(0.00)
male -0.25*
(0.13)
_cons 2.54***
(0.17)
inflate
school 1.15***
(0.31)
male 0.87**
(0.30)
_cons -3.70***
(0.57)
ll -1345.94
chi2 12.58
With a little bit of manual editing
we can produce an acceptable table of the output.
model
count coefficients
language arts -0.01**
(0.00)
male -0.25*
(0.13)
constant 2.54***
(0.17)
excess zero probit coefficients
school 1.15***
(0.31)
male 0.87**
(0.30)
constant -3.70***
(0.57)
log psuedo-
likelihood -1345.94
chi-squared 12.58
legend: coefficient/(standard error) * p<=.05 ** p<0.01 *** p<0.001
The zero-inflated poisson regression model predicting days absent from language arts and gender was statistically significant
(chi-squared = 12.58, df = 2, p<.01). The predictors of excess zeros, school (1.15)
and male (0.87) were both statistically significant.
The count predictors
langarts and male were also each statically significant.
For these data, the expected log count for a one-unit increase in language arts was -0.01.
This translates to a decrease of almost one day (0.999) absence for a one standard deviation
increase in language arts when gender is held constant.
Male students had an expected log count -0.25 less than female students which amounts
to about 2.27 fewer days absent than females while holding language arts constant.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services