UCLA Academic Technology Services HomeServicesClassesContactJobs

SAS Data Analysis Examples
Zero-Inflated Poisson Regression

Examples of Zero-inflated Poisson Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.

Description of the Data

Let's pursue Example 2 from above using the dataset fish.sas7bdat

We have data on 250 groups that went to a park.  Each group was questioned about how many fish they caught (count), how many children were in the group (child), how many people were in the group (persons), and whether or not they brought a camper to the park (camper).   

In addition to predicting the number of fish caught, there is interest in predicting the existence of excess zeros, i.e., the zeroes that were not simply a result of bad luck fishing. We will use the variables child, persons, and camper in our model. Let's look at the data.

proc means data = fish mean std min max var;
  var count child persons;
run;

The MEANS Procedure

Variable            Mean         Std Dev         Minimum         Maximum        Variance
----------------------------------------------------------------------------------------
count          3.2960000      11.6350281               0     149.0000000     135.3738795
child          0.6840000       0.8503153               0       3.0000000       0.7230361
persons        2.5280000       1.1127303       1.0000000       4.0000000       1.2381687
----------------------------------------------------------------------------------------

proc univariate data = fish noprint;
  histogram count / midpoints = 0 to 50 by 1 vscale = count ;
run;


proc freq data = fish;
  tables camper;
run;

The FREQ Procedure

                                   Cumulative    Cumulative
camper    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------
     0         103       41.20           103        41.20
     1         147       58.80           250       100.00

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a zero-inflated Poisson analysis, let's consider some other methods that you might use.

SAS Zero-inflated Poisson Regression Analysis using Proc Genmod

If you are using SAS version 9.2 or higher, you can run a zero-inflated Poisson model using proc genmod.

The output begins with a summary of the model and the data.  This is followed by a list of goodness of fit statistics. 

The next block of output includes parameter estimates from the count portion of the model.  It also includes the standard errors, Wald 95% confidence intervals, Wald Chi-square statistics, and p-values for the parameter estimates. 

The last block of output corresponds to the zero-inflation portion of the model.  This is a logistic model predicting the zeroes.  The output includes parameter estimates for the inflation model predictors and their standard errors, Wald 95% confidence intervals, Wald Chi-square statistics, and p-values. 

All of the predictors in both the count and inflation portions of the model are statistically significant.  This model fits the data significantly better than the null model, i.e., the intercept-only model. To show that this is the case, we can run the null model (a model without any predictors) and compare the null model with the current model using chi-squared test on the difference of log likelihoods.


proc genmod data = poissonreg;
  model daysabs =  /dist=zip;
  zeromodel  /link = logit ;
run;
The GENMOD Procedure
             Model Information
Data Set                          WORK.FISH
Distribution          Zero Inflated Poisson
Link Function                           Log
Dependent Variable                    count

Number of Observations Read         250
Number of Observations Used         250

             Criteria For Assessing Goodness Of Fit
Criterion                     DF           Value        Value/DF
Deviance                               2254.0459
Scaled Deviance                        2254.0459
Pearson Chi-Square           248       1918.7890          7.7371
Scaled Pearson X2            248       1918.7890          7.7371
Log Likelihood                          679.4854
Full Log Likelihood                   -1127.0229
AIC (smaller is better)                2258.0459
AICC (smaller is better)               2258.0945
BIC (smaller is better)                2265.0888

Algorithm converged.

                    Analysis Of Maximum Likelihood Parameter Estimates
                               Standard     Wald 95% Confidence          Wald
Parameter    DF    Estimate       Error           Limits           Chi-Square    Pr > ChiSq
Intercept     1      2.0316      0.0349      1.9631      2.1000       3388.16        <.0001
Scale         0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.

             Analysis Of Maximum Likelihood Zero Inflation Parameter Estimates
                               Standard     Wald 95% Confidence          Wald
Parameter    DF    Estimate       Error           Limits           Chi-Square    Pr > ChiSq
Intercept     1      0.2728      0.1277      0.0225      0.5232          4.56        0.0327

The log likelihoods for the full model and null mode are -1031.6084 and -1127.0229, respectively. The chi-squared value is 2*( -1031.6084 - -1127.0229) = 190.829. Since we have three predictor variables in the full model, the degrees of freedom for the chi-squared test is 3. This yields a p-value <.0001.  Thus, our overall model is statistically significant.

SAS Zero-inflated Poisson Analysis using Proc Countreg

Proc countreg is another option for running a zero-inflated Poisson regression in SAS (again, version 9.2 or higher).  This procedure allows a few more options specific to count outcomes than proc genmod.  The proc countreg code for the original model run on this page appears below.  We indicate method = qn to specify the quasi-Newton optimization process that matches the proc genmod results. 

proc countreg data = fish method = qn;
  model count = child camper / dist= zip;
  zeromodel count ~ persons;
run;

The COUNTREG Procedure

              Model Fit Summary
Dependent Variable                       count
Number of Observations                     250
Data Set                             WORK.FISH
Model                                      ZIP
ZI Link Function                      Logistic
Log Likelihood                           -1032
Maximum Absolute Gradient           4.61991E-7
Number of Iterations                        15
Optimization Method          Dual Quasi-Newton
AIC                                       2073
SBC                                       2091

Algorithm converged.

                           Parameter Estimates
                                           Standard                 Approx
Parameter        DF        Estimate           Error    t Value    Pr > |t|

Intercept         1        1.597889        0.085538      18.68      <.0001
child             1       -1.042838        0.099988     -10.43      <.0001
camper            1        0.834022        0.093627       8.91      <.0001
Inf_Intercept     1        1.297439        0.373850       3.47      0.0005
Inf_persons       1       -0.564347        0.162962      -3.46      0.0005

SAS Zero-inflated Poisson Analysis using Proc Nlmixed

For those using a version of SAS prior to 9.2, a zero-inflated negative binomial model is doable, though significantly more difficult.  Please see this code fragment: Zero-inflated Poisson and Negative Binomial Using Proc Nlmixed.

Sample Write-Up of the Analysis

The zero-inflated Poisson regression model predicting fish caught (count) from child, camper, and persons was statistically significant (chi-squared = 190.829, df = 3, p<.01). The predictor of excess zeros, persons, was statistically significant. The count predictors child and camper were also each statically significant. For these data, the expected change in log(count) for a one-unit increase in child was -1.0428. Groups with campers (camper = 1) had an expected log count 0.8340 higher than groups without campers (camper = 0). 

Cautions, Flies in the Ointment

  • It is not recommended that zero-inflated Poisson models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.
  • See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.