UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

Mplus Data Analysis Examples
Zero-inflated Poisson Regression

Examples of Zero-inflated Poisson Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zero's in the data because of the people that did not fish.

Description of the Data

Let's pursue Example 1 from above. This example uses the same data as the poisson regression.

We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.dat .  The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.

In addition to predicting the number of days absent there is interest in predicting the existence of excess zeros, i.e., the probability that a student will have zero absences. We will use both male and school to investigate this.

Let's look at the data.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a zero-inflated Poisson analysis, let's consider some other methods that you might use.

Zero-inflated Poisson Analysis

NOTE:  This example was done using Mplus version 4.21.  The syntax may not work with earlier versions of Mplus.

In the syntax below, we have indicated that daysabs is a count variable by using the count statement.  The (i) option is used to indicate that we are specifying a zero-inflated poisson model.  Without the (i) option, we would be estimating a poisson model without zero-inflation.  Also, we use the usevar statement to indicate that we are not using all of the variables in the data set in the current model.  We have omitted the missing statement because we have no missing data in this data set.  The default estimation method is MLR - maximum likelihood parameter estimates with standard errors and a chi-square test statistic that are robust to non-normality and non-independence of observations when used with type = complex, according to the Mplus 4 manual. The MLR standard errors are computed using a sandwich estimator. This is what we generally call robust standard errors.  To get the "regular" standard errors, we use the estimator = ml on the analysis statement.  (In the next example, we will omit the analysis statement and obtain the robust standard errors.)

TITLE:
Mplus DAE for zero-inflated poisson regression
DATA:
  FILE IS D:\poissonreg.dat;
VARIABLE:
  NAMES ARE id school male math langarts daysatt daysabs;
  COUNT IS daysabs (i);
  usevar school male math langarts daysabs;
ANALYSIS: estimator = ml;
MODEL:
  daysabs ON math langarts male;
  daysabs#1 ON school male;
MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

 DAYSABS    ON
    MATH               0.000    0.002     -0.164
    LANGARTS          -0.009    0.002     -4.974
    MALE              -0.246    0.049     -5.055

 DAYSABS#1  ON
    SCHOOL             1.151    0.314      3.662
    MALE               0.869    0.305      2.854

 Intercepts
    DAYSABS#1         -3.704    0.594     -6.233
    DAYSABS            2.545    0.073     34.744

In the MODEL RESULTS section of the output you will find the poisson regression coefficients (estimates) for each of the variables, standard errors and the ratio of the estimate to its standard error.  This can be used as a Z test, where values greater than 2 are considered to be statistically significant.  Following these are probit coefficients for predicting excess zeros.  In the above output, we see that math is not statistically significant, while langarts and male are.  Both school and male are statistically significant predictors of the zero inflation.

Now let's rerun the analysis without the analysis statement in order to obtain robust standard errors.

Using the robust standard errors, which tend to be larger than "regular" standard errors, we see that math, langarts and male are not statistically significant.  The variables school and male are still statistically significant predictors of the zero inflation.

Sample Write-Up of the Analysis

The predictors of excess zeros, school (1.15) and male (0.87) were both statistically significant. The count predictors langarts and male were also each statistically significant.  For these data, the expected log count for a one-unit increase in language arts was -0.01. This translates to a decrease of almost one day (0.999) absence for a one standard deviation increase in language arts when gender is held constant. Male students had an expected log count -0.25 less than female students which amounts to about 2.27 fewer days absent than females while holding language arts constant.

Cautions, Flies in the Ointment

  • It is not recommended that zero-inflated poisson models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.
  • See Also

     

    How to cite this page

    Report an error on this page or leave a comment

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.