UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

R Data Analysis Examples
Negative Binomial Regression

Note: the code on this page works with R 2.4.1.


Examples of Negative Binomial Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Description of the Data

Let's pursue Example 1 from above. We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.csv The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.

Let's look at the data.

The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.

Let's look at the data. For a nice layout for displaying the summary statistics, we made use of the package called fields.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a negative binomial regression analysis, let's consider some other methods that you might use.

R Negative Binomial Regression Analysis

R function glm.nb (generalized linear model) is used here for fitting the negative binomial model.

We begin by estimating the model with the variables of interest.

library(MASS)

m1<-glm.nb(daysabs~math+langarts+male)
summary(m1)

Call:
glm.nb(formula = daysabs ~ math + langarts + male, init.theta = 0.776166936578963, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9785  -1.0627  -0.4147   0.2865   2.8193  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  2.716069   0.234174  11.598  < 2e-16 ***
math        -0.001601   0.005300  -0.302  0.76259    
langarts    -0.014348   0.005372  -2.671  0.00756 ** 
male        -0.431185   0.139516  -3.091  0.00200 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for Negative Binomial(0.7762) family taken to be 1)

    Null deviance: 378.43  on 315  degrees of freedom
Residual deviance: 356.93  on 312  degrees of freedom
AIC: 1771.7

Number of Fisher Scoring iterations: 1

Correlation of Coefficients:
         (Intercept) math  langarts
math     -0.28                     
langarts -0.43       -0.69         
male     -0.40       -0.09  0.19   


              Theta:  0.7762 
          Std. Err.:  0.0742 

 2 x log-likelihood:  -1761.7460 

The output looks very much like the output from an OLS regression. The output begins with the information on deviance residuals. It gives us some idea on how well the model fits the data. You will then see the negative binomial regression coefficients for each of the variables along with the standard errors, z-scores, and p-values. The last part of the output gives the model fit information, including AIC.

Since math is not significant in the model with robust standard errors, we will rerun the model dropping that variable.

Finally, we will use the predict function to get the predicted values in days absent. For example, we fix the value of langarts at its mean and male at 0 and 1, the predict function returns the expected count for both male and female when the language arts score is held at its mean.

Finally, we compute the likehood ratio chi-square and degrees of freedom for our final model.

Sample Write-Up of the Analysis

Before we begin the sample write-up we need to get the output into a form more acceptable for publication, such as in Latex format. The Latex code can be generated easily using the function xtable from a package called xtable.

The LaTex code above generates a table as shown below:

The negative binomial regression model predicting days absent from school stay from language arts and gender was statistically significant (chi-squared = 20.63, df = 2, p<.00003). The predictors langarts and male were each statically significant. For these data, the expected log count increase for a one-unit increase in language arts was -0.0156. This translates to a decrease of about 1.56 days absent for a one standard deviation increase in language arts when gender is held constant. Male students had an expected log count -0.43 less than female students which amounts to about 2.39 fewer days absent than females while holding language arts constant.

Cautions, Flies in the Ointment

  • It is not recommended that negative binomial models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.
  • See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California