UCLA Academic Technology Services HomeServicesClassesContactJobs

SPSS Data Analysis Examples
Negative Binomial Regression

Note: The examples on this page were done in SPSS 17.  If you are using an earlier version of SPSS, you may need to use the genlog command.

Examples of Negative Binomial Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Description of the Data

Let's pursue Example 1 from above. This is the same example that was used in the page on poisson regression.

We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.dta. The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.

Let's look at the data.

GET FILE='D:\work\data\spss\poissonreg.sav'.
DESCRIPTIVES
  VARIABLES=male math langarts daysabs
  /STATISTICS=MEAN STDDEV VAR MIN MAX .

GRAPH /HISTOGRAM=daysabs .



FREQUENCIES VARIABLES=male.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a negative binomial regression model, let's consider some other methods that you might use.

SPSS Negative Binomial Regression Analysis

The output looks very much like the output from an OLS regression. The output begins the goodness of fit including log likelihood, AIC and BIC. These values can be used when comparing models.

Next comes the Tests of Model Effects. This section looks the same as the section of Parameter Estimates. This is because that we have entered all the variables as continuous variables. So each one of them has just one degree of freedom. With models where there are categorical predictor variables, this section will give the over effects of categorical variables and continuous variables as well.

The Parameter Estimates follows. You will find the negative binomial regression coefficients for each of the variables along with standard errors, Chi-Square values, p-values and 95% confidence intervals for the coefficients.

Now, just to be on the safe side, let's rerun the negbin command with the covb = robust option in order to obtain robust standard errors for the negative binomial regression coefficients.

Parameter Estimates

Parameter

 

95% Wald Confidence Interval

B

Std. Error

Lower

Upper

(Intercept)

2.716

.2135

2.298

3.135

male

-.431

.1401

-.706

-.157

langarts

-.014

.0054

-.025

-.004

math

-.002

.0063

-.014

.011

(Scale)

1a

 

 

 

(Negative binomial)

1.288

.1231

1.068

1.554

Dependent Variable: days absent

Model: (Intercept), male, langarts, math

a. Fixed at the displayed value.

 

Parameter Estimates

Parameter

Hypothesis Test

Wald Chi-Square

df

Sig.

(Intercept)

161.818

1

.000

male

9.472

1

.002

langarts

7.124

1

.008

math

.066

1

.798

Dependent Variable: days absent

Model: (Intercept), male, langarts, math

 

 Using the covb = robust option has resulted in a fairly large change in the model chi-square, which is now a Wald chi-square, based on log pseudo likelihoods, instead of a likelihood ratio chi-square. The robust standard errors attempt to adjust for heterogeneity in the model. The variable math was not significant without the covb = robust option and is even less so with it.

Since math is not significant in the model with robust standard errors, we will rerun the model dropping that variable.

 

Finally, we will use the emmeans option to get the predicted value in days absent for male and female. In order to use the emmeans option, we will have to specify variable male to be a categorical variable. The model specified this way is the same as the one above since male is a binary variable, except the reference group for male is now switched to male = 1. That is why the sign for the parameter coefficients are reversed.

Sample Write-Up of the Analysis

The negative binomial regression model predicting days absent from school from language arts and gender was statistically significant with likelihood ratio chi-square = 171.503, df=2 yielding p-value <.0001. The predictors langarts and male were each statically significant. For these data, the expected log count for a one-unit increase in language arts was -0.015. This translates to a decrease of about 1.5 days absent for a one standard deviation increase in language arts when gender is held constant. Male students had an expected log count -0.41 less than female students which amounts to about 2.27 fewer days absent than females while holding language arts constant.

Cautions, Flies in the Ointment

  • It is not recommended that negative binomial models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.

    See Also