UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Data Analysis Examples
Zero-inflated Negative Binomial Regression

Examples of Zero-inflated Negative Binomial Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.

Description of the Data

Let's pursue Example 1 from above. This example uses the same data as the poisson, negative binomial and zero-inflated poisson regressions.

We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.dta The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.

In addition to predicting the number of days absent there is interest in predicting the existence of excess zeros, i.e., the probability that a student will have zero absences. We will use the variable school to investigate this.

Let's look at the data.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a zero-inflated negative binomial analysis, let's consider some other methods that you might use.

Stata Zero-inflated Negative Binomial Analysis

The output looks very much like the output from an OLS regression. The output begins the iteration log giving the values of the log likelihoods starting with a model that has no predictors. The last value in the log is the final value of the log likelihood for the full model and is repeated below.

Next comes the header information. On the right-hand side the number of observations used (316) is given along with the likelihood ratio chi-squared with three degrees of freedom for the full model, followed by the p-value for the chi-square. The model, as a whole, is statistically significant. The header also includes a pseudo-R2 which is 0.0536 in this example.

Below the header you will find the poisson regression coefficients for each of the variables along with standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. Following these, are probit coefficients for predicting excess zeros along with their standard errors, z-scores, p-values and confidence intervals.

Below the various coefficients you will find the results of the Vuong test. The Vuong test compares the zero-inflated model with an ordinary poisson regression model. A significant z-test indicates that the zero-inflated model is better.

Since math is clearly not significant, let's rerun the model without it.

In the main body of the output contains the poisson and probit coefficients, robust standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. The robust standard errors attempt to adjust for heterogeneity in the model.

Finally, we will use the prchange command (findit prchange) by J. Scott Long and Jeremy Freese to get the predicted change in days absent.

Sample Write-Up of the Analysis

Instead of the usual interpretation of the coefficients in the model, we will explain why this is not a good model to use in an analysis. To see an example of a sample write-up of a zero-inflated model please see the page for zip.

The first indication that something is amiss is that the standard errors for the constant and coefficient for the inflation part of the model are so large. The associated z-scores are zero and the p-values are close to 1.0. The other indicator is that the Vuong test is not significant, i.e., the zero-inflated negative binomial model is not significantly better than the standard negative binomial model.

Although the count part of the model is still valid and you can obtain predicted counts you are probably better off running this model as an ordinary negative binimial regression model.

Cautions, Flies in the Ointment

  • It is not recommended that zero-inflated poisson models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.
  • See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California