|
|
|
||||
|
|
|||||
Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.
Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.
We have data on 250 groups that went to a park. Each group was questioned about how many fish they caught (count), how many children were in the group (child), how many people were in the group (persons), and whether or not they brought a camper to the park (camper).
In addition to predicting the number of fish caught, there is interest in predicting the existence of excess zeros, i.e. the zeroes that were not simply a result of bad luck fishing. We will use the variables child, persons, and camper in our model. Let's look at the data.
webuse fish
summarize count child persons camper
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
count | 250 3.296 11.63503 0 149
child | 250 .684 .8503153 0 3
persons | 250 2.528 1.11273 1 4
camper | 250 .588 .4931824 0 1
histogram count, discrete freq
tab1 child persons camper
-> tabulation of child
child | Freq. Percent Cum.
------------+-----------------------------------
0 | 132 52.80 52.80
1 | 75 30.00 82.80
2 | 33 13.20 96.00
3 | 10 4.00 100.00
------------+-----------------------------------
Total | 250 100.00
-> tabulation of persons
persons | Freq. Percent Cum.
------------+-----------------------------------
1 | 57 22.80 22.80
2 | 70 28.00 50.80
3 | 57 22.80 73.60
4 | 66 26.40 100.00
------------+-----------------------------------
Total | 250 100.00
-> tabulation of camper
camper | Freq. Percent Cum.
------------+-----------------------------------
0 | 103 41.20 41.20
1 | 147 58.80 100.00
------------+-----------------------------------
Total | 250 100.00
zinb count child camper, inflate(persons) vuong
Fitting constant-only model:
Iteration 0: log likelihood = -519.33992
Iteration 1: log likelihood = -471.96077
Iteration 2: log likelihood = -465.38193
Iteration 3: log likelihood = -464.39882
Iteration 4: log likelihood = -463.92704
Iteration 5: log likelihood = -463.79248
Iteration 6: log likelihood = -463.75773
Iteration 7: log likelihood = -463.7518
Iteration 8: log likelihood = -463.75119
Iteration 9: log likelihood = -463.75118
Fitting full model:
Iteration 0: log likelihood = -463.75118 (not concave)
Iteration 1: log likelihood = -440.43162
Iteration 2: log likelihood = -434.96651
Iteration 3: log likelihood = -433.49903
Iteration 4: log likelihood = -432.89949
Iteration 5: log likelihood = -432.89091
Iteration 6: log likelihood = -432.89091
Zero-inflated negative binomial regression Number of obs = 250
Nonzero obs = 108
Zero obs = 142
Inflation model = logit LR chi2(2) = 61.72
Log likelihood = -432.8909 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
count |
child | -1.515255 .1955912 -7.75 0.000 -1.898606 -1.131903
camper | .8790514 .2692731 3.26 0.001 .3512857 1.406817
_cons | 1.371048 .2561131 5.35 0.000 .8690758 1.873021
-------------+----------------------------------------------------------------
inflate |
persons | -1.666563 .6792833 -2.45 0.014 -2.997934 -.3351922
_cons | 1.603104 .8365065 1.92 0.055 -.036419 3.242626
-------------+----------------------------------------------------------------
/lnalpha | .9853533 .17595 5.60 0.000 .6404975 1.330209
-------------+----------------------------------------------------------------
alpha | 2.678758 .4713275 1.897425 3.781834
------------------------------------------------------------------------------
Vuong test of zinb vs. standard negative binomial: z = 1.70 Pr>z = 0.0444
The output looks very much like the output from an OLS regression. It begins
with the iteration log giving the values of the log likelihoods starting
with a model that has no predictors. The last value in the log is the final value
of the log likelihood for the full model and is repeated below.Next comes the header information. On the right-hand side the number of observations used (316) is given along with the likelihood ratio chi-squared. This compares the full model to a model without count predictors, giving a difference of two degrees of freedom. This is followed by the p-value for the chi-square. The model, as a whole, is statistically significant.
Below the header, you will find the negative binomial regression coefficients for each of the variables along with standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. Following these are logit coefficients for predicting excess zeros along with their standard errors, z-scores, p-values and confidence intervals. Additionally, there will be an estimate of the natural log of the over dispersion coefficient, alpha, along with the untransformed value. If the alpha coefficient is zero then the model is better estimated using an Poisson regression model.
Below the various coefficients you will find the results of the Vuong test. The Vuong test compares the zero-inflated model with an ordinary negative binomial regression model. A significant z-test indicates that the zero-inflated model is better.
Now, just to be on the safe side, let's rerun the zinb command with the robust option in order to obtain robust standard errors for the Poisson regression coefficients. We cannot include the vuong option when using robust standard errors.
zinb count child camper, inflate(persons) robust
Fitting constant-only model:
Iteration 0: log pseudolikelihood = -519.33992
Iteration 1: log pseudolikelihood = -471.96077
Iteration 2: log pseudolikelihood = -465.38193
Iteration 3: log pseudolikelihood = -464.39882
Iteration 4: log pseudolikelihood = -463.92704
Iteration 5: log pseudolikelihood = -463.79248
Iteration 6: log pseudolikelihood = -463.75773
Iteration 7: log pseudolikelihood = -463.7518
Iteration 8: log pseudolikelihood = -463.75119
Iteration 9: log pseudolikelihood = -463.75118
Fitting full model:
Iteration 0: log pseudolikelihood = -463.75118 (not concave)
Iteration 1: log pseudolikelihood = -440.43162
Iteration 2: log pseudolikelihood = -434.96651
Iteration 3: log pseudolikelihood = -433.49903
Iteration 4: log pseudolikelihood = -432.89949
Iteration 5: log pseudolikelihood = -432.89091
Iteration 6: log pseudolikelihood = -432.89091
Zero-inflated negative binomial regression Number of obs = 250
Nonzero obs = 108
Zero obs = 142
Inflation model = logit Wald chi2(2) = 40.15
Log pseudolikelihood = -432.8909 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
count |
child | -1.515255 .2419059 -6.26 0.000 -1.989381 -1.041128
camper | .8790514 .4740421 1.85 0.064 -.050054 1.808157
_cons | 1.371048 .4240159 3.23 0.001 .5399923 2.202104
-------------+----------------------------------------------------------------
inflate |
persons | -1.666563 .5200032 -3.20 0.001 -2.68575 -.6473755
_cons | 1.603104 .6748018 2.38 0.018 .2805164 2.925691
-------------+----------------------------------------------------------------
/lnalpha | .9853533 .0968016 10.18 0.000 .7956256 1.175081
-------------+----------------------------------------------------------------
alpha | 2.678758 .2593081 2.215827 3.238405
------------------------------------------------------------------------------
Using the robust option has resulted in a some change in the model chi-square, which is now a Wald chi-square. This statistic is based on log pseudo-likelihoods instead of log-likelihoods. The model is still statistically significant. The robust standard errors attempt to adjust for heterogeneity in the model. Finally, we will use the prchange command (findit prchange) by J. Scott Long and Jeremy Freese to get the predicted change in number of fish caught.
prchange
zinb: Changes in Rate for count
min->max 0->1 -+1/2 -+sd/2
child -6.0879 -4.8010 -3.6329 -3.0108
camper 1.8336 1.8336 1.9810 0.9537
exp(xb): 2.1826
base x values for count equation:
child camper
x= .684 .588
sd(x)= .850315 .493182
base x values for binary equation:
persons
x= 2.528
sd(x)= 1.11273
The zero-inflated negative binomial regression model predicting number of fish caught (count) from child, camper, and persons was statistically significant (both with and without robust standard errors). The predictor of excess zeros, persons, was statistically significant. The count predictors child and camper were also each statically significant. For these data, the expected change in log(count) for a one-unit increase in child was -1.515255. This amounts to a decrease of about 3.6 in the expected count for fish for each child in the party. Groups with campers (camper = 1) had an expected log(count) 0.879051 higher than groups without campers (camper = 0), i.e., beign a camper increases the expected coount of fish caught by 1.83. We can see in our model that the dispersion parameter alpha is significantly different from zero. This suggests that our data is overdispersed and that a negative binomial model is more appropriate than a Poisson model. The Vuong suggests that our zero-inflated model is a significant improvement over a standard negative binomial model.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services