|
|
|
||||
|
|
|||||
Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.
We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.dta . The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.
Let's look at the data.
use http://www.ats.ucla.edu/stat/stata/dae/poissonreg, clear
summarize daysabs math langarts male
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
daysabs | 316 5.810127 7.449003 0 45
math | 316 48.75101 17.88076 1.007114 98.99289
langarts | 316 50.06379 17.93921 1.007114 98.99289
male | 316 .4873418 .5006325 0 1
tabstat daysabs, stat(n mean var)
variable | N mean variance
-------------+------------------------------
daysabs | 316 5.810127 55.48764
--------------------------------------------
histogram daysabs, discrete freq
tab male
male | Freq. Percent Cum.
------------+-----------------------------------
0 | 162 51.27 51.27
1 | 154 48.73 100.00
------------+-----------------------------------
Total | 316 100.00
nbreg daysabs math langarts male
Fitting Poisson model:
Iteration 0: log likelihood = -1547.9709
Iteration 1: log likelihood = -1547.9709
Fitting constant-only model:
Iteration 0: log likelihood = -897.78991
Iteration 1: log likelihood = -891.24455
Iteration 2: log likelihood = -891.24271
Iteration 3: log likelihood = -891.24271
Fitting full model:
Iteration 0: log likelihood = -881.57337
Iteration 1: log likelihood = -880.87788
Iteration 2: log likelihood = -880.87312
Iteration 3: log likelihood = -880.87312
Negative binomial regression Number of obs = 316
LR chi2(3) = 20.74
Dispersion = mean Prob > chi2 = 0.0001
Log likelihood = -880.87312 Pseudo R2 = 0.0116
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math | -.001601 .00485 -0.33 0.741 -.0111067 .0079048
langarts | -.0143475 .0055815 -2.57 0.010 -.0252871 -.003408
male | -.4311844 .1396656 -3.09 0.002 -.704924 -.1574448
_cons | 2.716069 .232576 11.68 0.000 2.260229 3.17191
-------------+----------------------------------------------------------------
/lnalpha | .2533877 .0955362 .0661402 .4406351
-------------+----------------------------------------------------------------
alpha | 1.288383 .1230871 1.068377 1.553694
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 1334.20 Prob>=chibar2 = 0.000
The output looks very much like the output from an OLS regression. The output begins
the iteration log giving the values of the log likelihoods starting
with a model that has no predictors. The last value in the log is the final value
of the log likelihood for the full model and is repeated below.Next comes the header information. On the right-hand side the number of observations used (316) is given along with the likelihood ratio chi-squared with three degrees of freedom for the full model, followed by the p-value for the chi-square. The model, as a whole, is statistically significant. The header also includes a pseudo-R2 which is 0.0536 in this example.
Below the header you will find the negative binomial regression coefficients for each of the variables along with standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. Additionally, there will be an estimate of the natural log of the over dispersion coefficient, alpha, along with the transformed value. If the alpha coefficient is zero then the model is better estimated using an ordinary poisson regression model.
Below, the coefficients you will find a likelihood ratio test that alpha equals zero. In this example the associated chi-squared value is 1334.2 with one degree of freedom. These results strongly suggest that the negative binomial model is better than the poisson regression model. Now, just to be on the safe side, let's rerun the nbreg command with the robust option in order to obtain robust standard errors for the negative binomial regression coefficients.
nbreg daysabs math langarts male, robust
Fitting constant-only model:
Iteration 0: log pseudolikelihood = -897.78991
Iteration 1: log pseudolikelihood = -891.24455
Iteration 2: log pseudolikelihood = -891.24271
Iteration 3: log pseudolikelihood = -891.24271
Fitting full model:
Iteration 0: log pseudolikelihood = -881.57337
Iteration 1: log pseudolikelihood = -880.87788
Iteration 2: log pseudolikelihood = -880.87312
Iteration 3: log pseudolikelihood = -880.87312
Negative binomial regression Number of obs = 316
Dispersion = mean Wald chi2(3) = 25.78
Log pseudolikelihood = -880.87312 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Robust
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math | -.001601 .0062675 -0.26 0.798 -.0138851 .0106832
langarts | -.0143475 .0053859 -2.66 0.008 -.0249038 -.0037913
male | -.4311844 .1403611 -3.07 0.002 -.7062871 -.1560817
_cons | 2.716069 .2138534 12.70 0.000 2.296924 3.135214
-------------+----------------------------------------------------------------
/lnalpha | .2533877 .092161 .0727553 .43402
-------------+----------------------------------------------------------------
alpha | 1.288383 .1187387 1.075467 1.54345
------------------------------------------------------------------------------
Using the robust option has resulted in a fairly large change in the model chi-square,
which is now a Wald chi-square, based on log pseudolikelihoods, instead of a likelihood ratio
chi-square.In the main body of the output are the negative binomial coefficients, robust standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. The variable math was border-line significant without the robust option and is clearly not significant with it. The robust standard errors attempt to adjust for heterogeneity in the model.
Please note that with the robust standard errors option there is no likelihood ratio test for alpha equal to zero.
Since math is not significant in the model with robust standard errors, we will rerun the model dropping that variable.
nbreg daysabs langarts male, robust
Fitting Poisson model:
Iteration 0: log pseudolikelihood = -1549.8567
Iteration 1: log pseudolikelihood = -1549.8567
Fitting constant-only model:
Iteration 0: log pseudolikelihood = -897.78991
Iteration 1: log pseudolikelihood = -891.24455
Iteration 2: log pseudolikelihood = -891.24271
Iteration 3: log pseudolikelihood = -891.24271
Fitting full model:
Iteration 0: log pseudolikelihood = -881.55269
Iteration 1: log pseudolikelihood = -880.9306
Iteration 2: log pseudolikelihood = -880.9274
Iteration 3: log pseudolikelihood = -880.9274
Negative binomial regression Number of obs = 316
Dispersion = mean Wald chi2(2) = 25.52
Log pseudolikelihood = -880.9274 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Robust
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langarts | -.0156493 .0036558 -4.28 0.000 -.0228145 -.0084841
male | -.4312069 .1402156 -3.08 0.002 -.7060245 -.1563893
_cons | 2.70344 .201678 13.40 0.000 2.308158 3.098722
-------------+----------------------------------------------------------------
/lnalpha | .25394 .0915708 .0744646 .4334154
-------------+----------------------------------------------------------------
alpha | 1.289094 .1180434 1.077307 1.542517
------------------------------------------------------------------------------
Finally, we will use the prchange command (findit prchange) by J. Scott
Long and Jeremy Freese to get the predicted change in days absent.
prchange
nbreg: Changes in Rate for daysabs
min->max 0->1 -+1/2 -+sd/2 MargEfct
langarts -9.3413 -0.1879 -0.0865 -1.5570 -0.0865
male -2.3892 -2.3892 -2.4022 -1.1957 -2.3837
exp(xb): 5.5280
langarts male
x= 50.0638 .487342
sd(x)= 17.9392 .500633
estout, cells(b(star fmt(%8.2f)) se(par fmt(%8.2f))) stats(ll chi2, fmt(%8.2f))
.
b/se
daysabs
langarts -0.02***
(0.00)
male -0.43**
(0.14)
_cons 2.70***
(0.20)
lnalpha
_cons 0.25**
(0.09)
ll -880.93
chi2 25.52
With a little bit of manual editing
we can produce an acceptable table of the output.
model
language arts -0.02***
(0.00)
male -0.43**
(0.14)
constant 2.70***
(0.20)
log psuedo-
likelihood -880.93
chi-squared 25.52
legend: coefficient/(standard error) ** p<0.01 *** p<0.001
The negative binomial regression model predicting days absent from school stay from
language arts and gender was statistically significant
(chi-squared = 25.52, df = 2, p<.0001).
The predictors
langarts and male were each statically significant.
For these data, the expected log count for a one-unit increase in language arts was -0.02.
This translates to a decrease of about 1.56 days absent for a one standard deviation increase
in language arts when gender is held constant.
Male students had an expected log count -0.43 less than female students which amounts
to about 2.39 fewer days absent than females while holding language arts constant.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services