UCLA Academic Technology Services HomeServicesClassesContactJobs

SAS Data Analysis Examples
Zero-inflated Negative Binomial Regression

Examples of Zero-inflated Negative Binomial Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.

Description of the Data

Let's pursue Example 2 from above using the dataset fish.sas7bdat

We have data on 250 groups that went to a park.  Each group was questioned about how many fish they caught (count), how many children were in the group (child), how many people were in the group (persons), and whether or not they brought a camper to the park (camper).   

In addition to predicting the number of fish caught, there is interest in predicting the existence of excess zeros, i.e., the probability that a group caught zero fish. We will use the variables child, persons, and camper in our model. Let's look at the data.

proc means data = fish mean std min max var;
  var count child persons;
run;

The MEANS Procedure

Variable            Mean         Std Dev         Minimum         Maximum        Variance
----------------------------------------------------------------------------------------
count          3.2960000      11.6350281               0     149.0000000     135.3738795
child          0.6840000       0.8503153               0       3.0000000       0.7230361
persons        2.5280000       1.1127303       1.0000000       4.0000000       1.2381687
----------------------------------------------------------------------------------------

proc univariate data = fish noprint;
  histogram count / midpoints = 0 to 50 by 1 vscale = count ;
run;


proc freq data = fish;
  tables camper;
run;

The FREQ Procedure

                                   Cumulative    Cumulative
camper    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------
     0         103       41.20           103        41.20
     1         147       58.80           250       100.00

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a zero-inflated negative binomial analysis, let's consider some other methods that you might use.

SAS Zero-inflated Negative Binomial Analysis using Proc Countreg

If you have SAS version 9.2 or higher, you can carry out a zero-inflated negative binomial regression using proc countreg.

In the code below, we predict the number of fish with child and camper and predict the excessive zeroes with persons.

The output begins with the Model Fit Summary. This includes the model type ("ZINB"), link function used to model the inflation ("Logistic"), and optimization method as well as fit statistics like the log likelihood, AIC, and SBC.

Next comes the parameter estimates. In one block of output, we see the parameter estimates, standard errors, t statistics, and p-values from both the count and inflation models.  The parameters associated with the inflation model are indicated with Inf_.  Though all appear in the same block, these parameters must be interpreted in different ways: the first three correspond to the count model and should be interpreted as you would parameters from a negative binomial model; the Inf_ estimates correspond to the inflation model and should be interpreted as you would estimates from a logistic regression.   _Alpha is the dispersion parameter.  If _Alpha is zero, then a Poisson model would be more appropriate than a negative binomial model. 

All of the predictors in both the count and inflation portions of the model are statistically significant.  This model fits the data significantly better than the null model, i.e., the intercept-only model. To show that this is the case, we can run the null model and compare the null model with the current model using chi-squared test on the difference of log likelihood.


proc countreg data = fish method = qn;
  model count = / dist= zinegbin;
  zeromodel count ~ ;
run;

The COUNTREG Procedure

              Model Fit Summary
Dependent Variable                       count
Number of Observations                     250
Data Set                             WORK.FISH
Model                                     ZINB
ZI Link Function                      Logistic
Log Likelihood                      -464.43931
Maximum Absolute Gradient            0.0000371
Number of Iterations                        35
Optimization Method          Dual Quasi-Newton
AIC                                  934.87863
SBC                                  945.44301

Algorithm converged.

                           Parameter Estimates
                                           Standard                 Approx
Parameter        DF        Estimate           Error    t Value    Pr > |t|
Intercept         1        1.192709        0.151551       7.87      <.0001
Inf_Intercept     0      -23.388418               .        .         .
_Alpha            1        5.438516        0.664078       8.19      <.0001

The log likelihood for the full model is -432.89091 and is -464.43931 for the null model. The chi-squared value is 2*( -432.89091 - -464.43931) = 63.0968. Since we have three predictor variables in the full model, the degrees of freedom for the chi-squared test is 3. This yields a p-value <.0001. Thus, our overall model is statistically significant.

SAS Zero-inflated Negative Binomial Analysis using Proc Nlmixed

For those using a version of SAS lower than 9.2, a zero-inflated negative binomial model is doable, though significantly more difficult.  Please see this code fragment: Zero-inflated Poisson and Negative Binomial Using Proc Nlmixed.

Sample Write-Up of the Analysis

The zero-inflated negative binomial regression model predicting fish caught (count) from child, camper, and persons was statistically significant (chi-squared = 63.0968, df = 3, p<.01). The predictor of excess zeros, persons, was statistically significant. The count predictors child and camper were also each statically significant. For these data, the expected change in log(count) for a one-unit increase in child was -1.515255. Groups with campers (camper = 1) had an expected log count 0.879051 higher than groups without campers (camper = 0).  The model allows us to reject the null hypothesis that _Alpha = 0, suggesting that our negative binomial model is more appropriate than a Poisson model. 

Cautions, Flies in the Ointment

  • It is not recommended that zero-inflated negative binomial models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.
  • References

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.