SAS Data Analysis Examples
Negative Binomial Regression

Examples of Negative Binomial Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Description of the Data

Let's pursue Example 1 from above. This is the same example that was used in the page on poisson regression.

We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.csv. The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.

Let's look at the data.

Data poissonreg;
  infile "d:\work\data\raw\poissonreg.csv" delimiter="," firstobs=2;
  input id school male math langarts daysatt daysabs;

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a negative binomial regression model, let's consider some other methods that you might use.

SAS Negative Binomial Regression Analysis

The output looks very much like the output from an OLS regression. The output begins with a summary of the dataset and model, followed by a list of various goodness of fit statistics. These are likelihood based.  Below the fit statistics, you will find the negative binomial regression coefficients for each of the variables along with the corresponding standard errors, Wald 95% confidence intervals, Wald Chi-Square statistics, and p-values.  After the coefficients for the predictors, there is an estimate for the Dispersion parameter.  If the dispersion is 0, then a Poisson model be more appropriate to the data.  Based on the 95% Confidence Limits for our dispersion parameter, we can say that dispersion is significantly different from 0 and we are justified in our negative binomial model.

Now, just to be on the safe side, let's rerun proc genmod with the repeated statement in order to obtain robust standard errors for the negative binomial regression coefficients.

The robust standard errors attempt to adjust for heterogeneity in the model. Using the robust standard errors has resulted in small changes in the standard errors and the z-tests still yield similar significant results.

The variable math is not significant with or without the repeated statement. Since math is not significant in the model with robust standard errors, we will rerun the model dropping that variable.

This model fits the data significantly better than the null model, i.e., the intercept-only model. To show that this is the case, we can run the null model and compare the null model with the current model using chi-squared test on the difference of log likelihood.

proc genmod data = poissonreg;
  model daysabs = / dist=negbin;
             Criteria For Assessing Goodness Of Fit

Criterion                     DF           Value        Value/DF

Deviance                     315        356.9918          1.1333
Scaled Deviance              315        356.9918          1.1333
Pearson Chi-Square           315        329.9199          1.0474
Scaled Pearson X2            315        329.9199          1.0474
Log Likelihood                         2138.9953
Full Log Likelihood                    -891.2427
AIC (smaller is better)                1786.4854
AICC (smaller is better)               1786.5238
BIC (smaller is better)                1793.9969

The log likelihood for the full model is -880.9274 and is -891.2427 for the null model. The chi-squared value is 2*( -880.9274 - -891.2427) = 20.6306. Since we have two predictor variables in the full model, the degrees of freedom for the chi-squared test is 2. This yields a p-value <.0001. Thus, our overall model is statistically significant.

Finally, we will use the estimate statement to get the predicted change in days absent for male and female group when the langarts is held at its mean.

Using Proc Countreg

If you are using SAS version 9.2 or higher, you could run a negative binomial regression using proc countreg.  This procedure allows a few more options specific to count outcomes than proc genmod.  The proc countreg code for the original model run on this page appears below.

proc countreg data = poissonreg;
  model daysabs = male math langarts /dist=negbin (p=2);

Sample Write-Up of the Analysis

In the negative binomial regression model predicting days absent from school stay with language arts and gender, our predictors langarts and male were each statically significant. For these data, the change in expected change in log count for a one-unit increase in language arts was -0.0156.  Male students had an expected log count 0.4312 less than female students.

Cautions, Flies in the Ointment

  • It is not recommended that negative binomial models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.
  • See Also


    How to cite this page

    Report an error on this page or leave a comment

    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.