UCLA Academic Technology Services HomeServicesClassesContactJobs

Stata Data Analysis Examples
Zero-inflated Poisson Regression

Examples of Zero-inflated Poisson Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.

Description of the Data

Let's pursue Example 2 from above. 

We have data on 250 groups that went to a park.  Each group was questioned about how many fish they caught (count), how many children were in the group (child), how many people were in the group (persons), and whether or not they brought a camper to the park (camper).   

In addition to predicting the number of fish caught, there is interest in predicting the existence of excess zeros, i.e. the zeroes that were not simply a result of bad luck fishing. We will use the variables child, persons, and camper in our model. Let's look at the data.

webuse fish

summarize count child persons camper


    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       count |       250       3.296    11.63503          0        149
       child |       250        .684    .8503153          0          3
     persons |       250       2.528     1.11273          1          4
      camper |       250        .588    .4931824          0          1

histogram count, discrete freq
     


tab1 child persons camper

-> tabulation of child  

      child |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        132       52.80       52.80
          1 |         75       30.00       82.80
          2 |         33       13.20       96.00
          3 |         10        4.00      100.00
------------+-----------------------------------
      Total |        250      100.00

-> tabulation of persons  

    persons |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         57       22.80       22.80
          2 |         70       28.00       50.80
          3 |         57       22.80       73.60
          4 |         66       26.40      100.00
------------+-----------------------------------
      Total |        250      100.00

-> tabulation of camper  

     camper |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        103       41.20       41.20
          1 |        147       58.80      100.00
------------+-----------------------------------
      Total |        250      100.00

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a zero-inflated Poisson analysis, let's consider some other methods that you might use.

Stata Zero-inflated Poisson Analysis

The command below is for a zero-inflated Poisson model in which the number of fish caught is predicted with child and camper and the excessive zeroes in the outcome variable count are predicted with persons. The output begins the iteration log giving the values of the log likelihoods starting with a model that has no predictors. The last value in the log is the final value of the log likelihood for the full model and is repeated below.

Next comes the header information. On the right-hand side the number of observations used (316) is given along with the likelihood ratio chi-squared.  This compares the full model to a model without count predictors, giving a difference of two degrees of freedom.  This is followed by the p-value for the chi-square. The model, as a whole, is statistically significant.

Below the header you will find the Poisson regression coefficients for each of the count predicting variables along with standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. Following these are logit coefficients for the variable predicting excess zeros along with its standard errors, z-scores, p-values and confidence intervals.

Below the various coefficients you will find the results of the Vuong test. The Vuong test compares the zero-inflated model with an ordinary Poisson regression model. A significant z-test indicates that the zero-inflated model is better.

Now, just to be on the safe side, let's rerun the zip command with the robust option in order to obtain robust standard errors for the poisson regression coefficients. We cannot include the vuong option when using robust standard errors.

Using the robust option has resulted in a fairly large change in the model chi-square, which is now a Wald chi-square.  This statistic is based on log pseudo-likelihoods instead of log-likelihoods.

The robust standard errors attempt to adjust for heterogeneity in the model.

Finally, we will use the prchange command (findit prchange) by J. Scott Long and Jeremy Freese to get the predicted change in days absent.

Sample Write-Up of the Analysis

The zero-inflated Poisson regression model predicting fish caught (count) from child, camper, and persons was statistically significant (both with and without robust standard errors). The predictor of excess zeros, persons, was statistically significant. The count predictors child and camper were also each statically significant. For these data, the expected change in log(count) for a one-unit increase in child was -1.0428. Groups with campers (camper = 1) had an expected log(count) value 0.8340 higher than groups without campers (camper = 0).  The Vuong test suggests that our zero-inflated model is a significant improvement over a standard Poisson model.

Cautions, Flies in the Ointment

  • It is not recommended that zero-inflated Poisson models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.
  • See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.