UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Data Analysis Examples
Negative Binomial Regression

Examples of Negative Binomial Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Description of the Data

Let's pursue Example 1 from above. This is the same example that was used in the page on poisson regression.

We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.dta . The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.

Let's look at the data.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a negative binomial regression model, let's consider some other methods that you might use.

Stata Negative Binomial Regression Analysis

The output looks very much like the output from an OLS regression. The output begins the iteration log giving the values of the log likelihoods starting with a model that has no predictors. The last value in the log is the final value of the log likelihood for the full model and is repeated below.

Next comes the header information. On the right-hand side the number of observations used (316) is given along with the likelihood ratio chi-squared with three degrees of freedom for the full model, followed by the p-value for the chi-square. The model, as a whole, is statistically significant. The header also includes a pseudo-R2 which is 0.0536 in this example.

Below the header you will find the negative binomial regression coefficients for each of the variables along with standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. Additionally, there will be an estimate of the natural log of the over dispersion coefficient, alpha, along with the transformed value. If the alpha coefficient is zero then the model is better estimated using an ordinary poisson regression model.

Below, the coefficients you will find a likelihood ratio test that alpha equals zero. In this example the associated chi-squared value is 1334.2 with one degree of freedom. These results strongly suggest that the negative binomial model is better than the poisson regression model. Now, just to be on the safe side, let's rerun the nbreg command with the robust option in order to obtain robust standard errors for the negative binomial regression coefficients.

Using the robust option has resulted in a fairly large change in the model chi-square, which is now a Wald chi-square, based on log pseudolikelihoods, instead of a likelihood ratio chi-square.

In the main body of the output are the negative binomial coefficients, robust standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. The variable math was border-line significant without the robust option and is clearly not significant with it. The robust standard errors attempt to adjust for heterogeneity in the model.

Please note that with the robust standard errors option there is no likelihood ratio test for alpha equal to zero.

Since math is not significant in the model with robust standard errors, we will rerun the model dropping that variable.

Finally, we will use the prchange command (findit prchange) by J. Scott Long and Jeremy Freese to get the predicted change in days absent.

Sample Write-Up of the Analysis

Before we begin the sample write-up we need to get the output into a form more acceptable for publication. The estout command (findit estout by Ben Jann of ETH Zurich), will get us close to what we want. With a little bit of manual editing we can produce an acceptable table of the output. The negative binomial regression model predicting days absent from school stay from language arts and gender was statistically significant (chi-squared = 25.52, df = 2, p<.0001). The predictors langarts and male were each statically significant. For these data, the expected log count for a one-unit increase in language arts was -0.02. This translates to a decrease of about 1.56 days absent for a one standard deviation increase in language arts when gender is held constant. Male students had an expected log count -0.43 less than female students which amounts to about 2.39 fewer days absent than females while holding language arts constant.

Cautions, Flies in the Ointment

  • It is not recommended that negative binomial models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.
  • See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California