UCLA Academic Technology Services HomeServicesClassesContactJobs

R Data Analysis Examples
Zero-Inflated Poisson Regression

Note: the code on this page works with R 2.4.1.


Examples of Zero-Inflated Poisson Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.

Description of the Data

Let's pursue Example 2 from above. 

We have data on 250 groups that went to a park.  Each group was questioned about how many fish they caught (count), how many children were in the group (child), how many people were in the group (persons), and whether or not they brought a camper to the park (camper).   

In addition to predicting the number of fish caught, there is interest in predicting the existence of excess zeros, i.e., the probability that a group caught zero fish. We will use the variables child, persons, and camper in our model.

Let's look at the data. For a nice layout for displaying the summary statistics, we made use of the package called fields. We can also look at the variance and a histogram of our outcome variable count.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a zero-inflated Poisson regression analysis, let's consider some other methods that you might use.

R Zero-Inflated Poisson Regression Analysis

Though we can run a Poisson regression in R using the glm function in one of the base packages, we will need to install an additional R package to run a zero-inflated Poisson regression.  We have opted for pscl.  To install this, enter install.packages("pscl"). In order to load the pscl package, you may need to install additional packages in the same manner. 

library(pscl)
zip<-zeroinfl(count ~ child + camper | persons)
summary(zip)

Call:
zeroinfl(formula = count ~ child + camper | persons)


Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.59789    0.08554  18.680   <2e-16 ***
child       -1.04284    0.09999 -10.430   <2e-16 ***
camper       0.83402    0.09363   8.908   <2e-16 ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.2974     0.3739   3.470 0.000520 ***
persons      -0.5643     0.1630  -3.463 0.000534 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 10 
Log-likelihood: -1032 on 5 Df
The output looks very much like the output from two OLS regressions in R.

Below the model call, you will find a block of output containing Poisson regression coefficients for each of the variables along with standard errors, z-scores, and p-values for the coefficients. A second block follows that corresponds to the inflation model.  This includes logit coefficients for predicting excess zeros along with their standard errors, z-scores, and p-values.

All of the predictors in both the count and inflation portions of the model are statistically significant.  This model fits the data significantly better than the null model, i.e., the intercept-only model. To show that this is the case, we can compare with the current model to a null model without predictors using chi-squared test on the difference of log likelihoods.


library(lmtest)
lrtest(zip)

Likelihood ratio test

Model 1: count ~ child + camper | persons
Model 2: count ~ 1
  #Df  LogLik Df  Chisq Pr(>Chisq)    
1   5 -1031.6                         
2   2 -1127.0 -3 190.83  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the output above, we can see that the chi-squared value 190.83. Since we have three predictor variables in the full model, the degrees of freedom for the chi-squared test is 3. This yields a p-value <.0001.  Thus, our overall model is statistically significant.

Note that the model output above does not indicate in any way if our zero-inflated model is an improvement over a standard Poisson regression.  We can determine this by running the corresponding standard Poisson model and then performing a Vuong test of the two models. 

p1<-glm(count ~ child + camper, family=poisson)
summary(p1)

Call:
glm(formula = count ~ child + camper, family = poisson)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.7736  -2.2293  -1.2024  -0.3498  24.9492  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.91026    0.08119   11.21   <2e-16 ***
child       -1.23476    0.08029  -15.38   <2e-16 ***
camper       1.05267    0.08871   11.87   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 2958.4  on 249  degrees of freedom
Residual deviance: 2380.1  on 247  degrees of freedom
AIC: 2723.2

Number of Fisher Scoring iterations: 6

vuong(p1, zip)

Vuong Non-Nested Hypothesis Test-Statistic: -3.57426 
(test-statistic is asymptotically distributed N(0,1) under the
null that the models are indistinguishible)
in this case:
model2 > model1, with p-value 0.0001756096

The Vuong test compares the zero-inflated model with an ordinary Poisson regression model.  In this example, we can see that our test statistic is significant, indicating that the zero-inflated model is superior to the standard Poisson model. 

Sample Write-Up of the Analysis

The zero-inflated Poisson regression model predicting fish caught (count) from child, camper, and persons was statistically significant (chi-squared = 190.83, df = 3, p<.01). The predictor of excess zeros, persons, was statistically significant. The count predictors child and camper were also each statically significant. For these data, the expected change in log(count) for a one-unit increase in child was -1.04284. Groups with campers (camper = 1) had an expected log count 0.83402 higher than groups without campers (camper = 0).  The Vuong test suggests that our zero-inflated model is a significant improvement over a standard Poisson model.

Cautions, Flies in the Ointment

  • It is not recommended that zero-inflated poisson models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.

    See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.