|
|
|
||||
|
|
|||||
Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.
Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.
We have data on 250 groups that went to a park. Each group was questioned about how many fish they caught (count), how many children were in the group (child), how many people were in the group (persons), and whether or not they brought a camper to the park (camper).
In addition to predicting the number of fish caught, there is interest in predicting the existence of excess zeros, i.e. the zeroes that were not simply a result of bad luck fishing. We will use the variables child, persons, and camper in our model. Let's look at the data.
webuse fish
summarize count child persons camper
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
count | 250 3.296 11.63503 0 149
child | 250 .684 .8503153 0 3
persons | 250 2.528 1.11273 1 4
camper | 250 .588 .4931824 0 1
histogram count, discrete freq
tab1 child persons camper
-> tabulation of child
child | Freq. Percent Cum.
------------+-----------------------------------
0 | 132 52.80 52.80
1 | 75 30.00 82.80
2 | 33 13.20 96.00
3 | 10 4.00 100.00
------------+-----------------------------------
Total | 250 100.00
-> tabulation of persons
persons | Freq. Percent Cum.
------------+-----------------------------------
1 | 57 22.80 22.80
2 | 70 28.00 50.80
3 | 57 22.80 73.60
4 | 66 26.40 100.00
------------+-----------------------------------
Total | 250 100.00
-> tabulation of camper
camper | Freq. Percent Cum.
------------+-----------------------------------
0 | 103 41.20 41.20
1 | 147 58.80 100.00
------------+-----------------------------------
Total | 250 100.00
zip count child camper, inflate(persons) vuong
Fitting constant-only model:
Iteration 0: log likelihood = -1347.807
Iteration 1: log likelihood = -1315.5343
Iteration 2: log likelihood = -1126.3689
Iteration 3: log likelihood = -1125.5358
Iteration 4: log likelihood = -1125.5357
Iteration 5: log likelihood = -1125.5357
Fitting full model:
Iteration 0: log likelihood = -1125.5357
Iteration 1: log likelihood = -1044.8553
Iteration 2: log likelihood = -1031.8733
Iteration 3: log likelihood = -1031.6089
Iteration 4: log likelihood = -1031.6084
Iteration 5: log likelihood = -1031.6084
Zero-inflated Poisson regression Number of obs = 250
Nonzero obs = 108
Zero obs = 142
Inflation model = logit LR chi2(2) = 187.85
Log likelihood = -1031.608 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
count |
child | -1.042838 .0999883 -10.43 0.000 -1.238812 -.846865
camper | .8340222 .0936268 8.91 0.000 .650517 1.017527
_cons | 1.597889 .0855382 18.68 0.000 1.430237 1.76554
-------------+----------------------------------------------------------------
inflate |
persons | -.5643472 .1629638 -3.46 0.001 -.8837503 -.244944
_cons | 1.297439 .3738522 3.47 0.001 .5647022 2.030176
------------------------------------------------------------------------------
Vuong test of zip vs. standard Poisson: z = 3.57 Pr>z = 0.0002
The output begins
the iteration log giving the values of the log likelihoods starting
with a model that has no predictors. The last value in the log is the final value
of the log likelihood for the full model and is repeated below.Next comes the header information. On the right-hand side the number of observations used (316) is given along with the likelihood ratio chi-squared. This compares the full model to a model without count predictors, giving a difference of two degrees of freedom. This is followed by the p-value for the chi-square. The model, as a whole, is statistically significant.
Below the header you will find the Poisson regression coefficients for each of the count predicting variables along with standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. Following these are logit coefficients for the variable predicting excess zeros along with its standard errors, z-scores, p-values and confidence intervals.
Below the various coefficients you will find the results of the Vuong test. The Vuong test compares the zero-inflated model with an ordinary Poisson regression model. A significant z-test indicates that the zero-inflated model is better.
Now, just to be on the safe side, let's rerun the zip command with the robust option in order to obtain robust standard errors for the poisson regression coefficients. We cannot include the vuong option when using robust standard errors.
zip count child camper, inflate(persons) robust
Fitting constant-only model:
Iteration 0: log pseudolikelihood = -1347.807
Iteration 1: log pseudolikelihood = -1315.5343
Iteration 2: log pseudolikelihood = -1126.3689
Iteration 3: log pseudolikelihood = -1125.5358
Iteration 4: log pseudolikelihood = -1125.5357
Iteration 5: log pseudolikelihood = -1125.5357
Fitting full model:
Iteration 0: log pseudolikelihood = -1125.5357
Iteration 1: log pseudolikelihood = -1044.8553
Iteration 2: log pseudolikelihood = -1031.8733
Iteration 3: log pseudolikelihood = -1031.6089
Iteration 4: log pseudolikelihood = -1031.6084
Iteration 5: log pseudolikelihood = -1031.6084
Zero-inflated Poisson regression Number of obs = 250
Nonzero obs = 108
Zero obs = 142
Inflation model = logit Wald chi2(2) = 7.25
Log pseudolikelihood = -1031.608 Prob > chi2 = 0.0266
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
count |
child | -1.042838 .3893772 -2.68 0.007 -1.806004 -.2796731
camper | .8340222 .4076029 2.05 0.041 .0351352 1.632909
_cons | 1.597889 .2934631 5.44 0.000 1.022711 2.173066
-------------+----------------------------------------------------------------
inflate |
persons | -.5643472 .2888849 -1.95 0.051 -1.130551 .0018567
_cons | 1.297439 .493986 2.63 0.009 .3292445 2.265634
------------------------------------------------------------------------------
Using the robust option has resulted in a fairly large change in the model chi-square,
which is now a Wald chi-square. This statistic is based on log pseudo-likelihoods instead of
log-likelihoods.The robust standard errors attempt to adjust for heterogeneity in the model.
Finally, we will use the prchange command (findit prchange) by J. Scott Long and Jeremy Freese to get the predicted change in days absent.
prchange
zip: Changes in Rate for count
min->max 0->1 -+1/2 -+sd/2
child -4.1079 -2.7818 -2.2961 -1.9285
camper 1.6792 1.6792 1.8071 0.8720
exp(xb): 2.1051
base x values for count equation:
child camper
x= .684 .588
sd(x)= .850315 .493182
base x values for binary equation:
persons
x= 2.528
sd(x)= 1.11273
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services