|
|
|
||||
|
|
|||||
Example 1. A study of the length of hospital stay, in days, as a function of age, kind of health insurance and whether or not the patient died while in the hospital. Length of hospital stay is recorded as a minimum of at least one day.
Example 2. A study of the number of journal articles published by tenured faculty as a function of discipline (fine arts, science, social science, humanities, medical, etc). To get tenure faculty must publish, i.e., there are no tenured faculty with zero publications.
Example 3. A study by the county traffic court on the number of tickets received by teenagers as predicted by school performance, amount of driver training and gender. Only individuals who have received at least one citation are in the traffic court files.
We have a hypothetical data file, ztp.dta with 1,493 observations.
The length of hospital stay variable is stay
Let's look at the data.
Next comes the header information. On the right-hand side the number of
observations used (1493) is given along with the likelihood ratio chi-squared with three degrees of
freedom for the full model, followed by the p-value for the chi-square.
The model, as a whole, is statistically significant. The header also includes
a pseudo-R2 which is very low in this example (0.0033).
Below the header you will find the zero-truncated negative binomial
coefficients for each of the variables
along with standard errors, z-scores, p-values and 95% confidence intervals for the
coefficients.
In addition to the negative binomial coefficients, Stata also estimates the log of the
over dispersion parameter alpha. Below lnalpha, Stata displays alpha. If alpha equals
zero then there is no over dispersion. The likelihood ratio chi-square tests alpha.
If the test is significant, as in this case, then zero-truncated negative binomial
is preferred over zero-truncated poisson.
Now, just to be on the safe side, let's rerun the ztp command with the robust
option in order to obtain robust standard errors for the zero-truncated poisson coefficients.
In the main body of the output are the zero-truncated poisson coefficients,
robust standard errors, z-scores, p-values and 95% confidence intervals for the
coefficients. The robust
standard errors attempt to adjust for heterogeneity in the model.
We again see the log alpha and alpha, but there is no likelihood ratio test of alpha because
we are using log pseudolikelihoods. That's okay because we can use the likelihood ratio test
from the model without the robust option.
Zero-truncated poisson models can also display results as incidence rate ratios using the irr option.
UCLA Researchers are invited to our Statistical Consulting Servicesuse http://www.ats.ucla.edu/stat/stata/dae/ztp, clear
summarize stay
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
stay | 1493 9.728734 8.132908 1 74
histogram stay, discrete
tab1 age hmo died
-> tabulation of age
Age Group | Freq. Percent Cum.
------------+-----------------------------------
1 | 6 0.40 0.40
2 | 60 4.02 4.42
3 | 163 10.92 15.34
4 | 291 19.49 34.83
5 | 317 21.23 56.06
6 | 327 21.90 77.96
7 | 190 12.73 90.69
8 | 93 6.23 96.92
9 | 46 3.08 100.00
------------+-----------------------------------
Total | 1,493 100.00
-> tabulation of hmo
hmo | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,254 83.99 83.99
1 | 239 16.01 100.00
------------+-----------------------------------
Total | 1,493 100.00
-> tabulation of died
died | Freq. Percent Cum.
------------+-----------------------------------
0 | 981 65.71 65.71
1 | 512 34.29 100.00
------------+-----------------------------------
Total | 1,493 100.00Some Strategies You Might Be Tempted To Try
Before we show how you can analyze these data with a zero-truncated negative binomial analysis, let's
consider some other methods that you might use.
Stata Zero-Truncated Negative Binomial Analysis
ztnb stay age hmo died
Fitting Zero-truncated poisson model:
Iteration 0: log likelihood = -6908.7992
Iteration 1: log likelihood = -6908.7991
Fitting constant-only model:
Iteration 0: log likelihood = -4817.852
Iteration 1: log likelihood = -4778.7604
Iteration 2: log likelihood = -4770.8734
Iteration 3: log likelihood = -4770.848
Iteration 4: log likelihood = -4770.848
Fitting full model:
Iteration 0: log likelihood = -4755.5912
Iteration 1: log likelihood = -4755.2798
Iteration 2: log likelihood = -4755.2796
Zero-truncated negative binomial regression Number of obs = 1493
LR chi2(3) = 31.14
Dispersion = mean Prob > chi2 = 0.0000
Log likelihood = -4755.2796 Pseudo R2 = 0.0033
------------------------------------------------------------------------------
stay | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0156929 .013107 -1.20 0.231 -.0413822 .0099964
hmo | -.1470576 .0592161 -2.48 0.013 -.263119 -.0309962
died | -.2177714 .0461605 -4.72 0.000 -.3082442 -.1272985
_cons | 2.408328 .071982 33.46 0.000 2.267245 2.54941
-------------+----------------------------------------------------------------
/lnalpha | -.5686389 .0551506 -.6767321 -.4605457
-------------+----------------------------------------------------------------
alpha | .5662957 .0312316 .5082753 .6309393
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 4307.04 Prob>=chibar2 = 0.000
The output looks very much like the output from an OLS regression. The output begins
the iteration log giving the values of the log likelihoods starting
with a model that has no predictors. The last value in the log is the final value
of the log likelihood for the full model and is repeated below.ztnb stay age hmo died, robust
Fitting Zero-truncated poisson model:
Iteration 0: log pseudolikelihood = -6908.7992
Iteration 1: log pseudolikelihood = -6908.7991
Fitting constant-only model:
Iteration 0: log pseudolikelihood = -4817.852
Iteration 1: log pseudolikelihood = -4778.7604
Iteration 2: log pseudolikelihood = -4770.8734
Iteration 3: log pseudolikelihood = -4770.848
Iteration 4: log pseudolikelihood = -4770.848
Fitting full model:
Iteration 0: log pseudolikelihood = -4755.5912
Iteration 1: log pseudolikelihood = -4755.2798
Iteration 2: log pseudolikelihood = -4755.2796
Zero-truncated negative binomial regression Number of obs = 1493
Dispersion = mean Wald chi2(3) = 25.29
Log likelihood = -4755.2796 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Robust
stay | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0156929 .0131522 -1.19 0.233 -.0414707 .0100849
hmo | -.1470576 .0572278 -2.57 0.010 -.2592221 -.0348931
died | -.2177714 .0525957 -4.14 0.000 -.320857 -.1146857
_cons | 2.408328 .0751937 32.03 0.000 2.260951 2.555705
-------------+----------------------------------------------------------------
/lnalpha | -.5686389 .065313 -.69665 -.4406277
-------------+----------------------------------------------------------------
alpha | .5662957 .0369865 .4982516 .6436323
------------------------------------------------------------------------------
Using the robust option has resulted in a fairly large change in the model chi-square,
which is now a Wald chi-square, based on log pseudolikelihoods, instead of a likelihood ratio
chi-square.ztp, irr
Zero-truncated negative binomial regression Number of obs = 1493
Dispersion = mean Wald chi2(3) = 25.29
Log likelihood = -4755.2796 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Robust
stay | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .9844296 .0129474 -1.19 0.233 .9593774 1.010136
hmo | .8632442 .0494016 -2.57 0.010 .7716516 .9657086
died | .8043093 .0423032 -4.14 0.000 .725527 .8916463
-------------+----------------------------------------------------------------
/lnalpha | -.5686389 .065313 -.69665 -.4406277
-------------+----------------------------------------------------------------
alpha | .5662957 .0369865 .4982516 .6436323
------------------------------------------------------------------------------
Sample Write-Up of the Analysis
Before we begin the sample write-up we need to get the output into a form more acceptable for
publication. In order to go back to the non-exponentiated version of the coefficients
we will quietly rerun the ztnb command.
The estout command (findit estout by Ben Jann of ETH Zurich), will get
us close to what we want.
quietly ztnb
estout, cells(b(star fmt(%8.2f)) se(par fmt(%8.2f))) stats(ll chi2, fmt(%8.2f))
b/se
stay
age -0.02
(0.01)
hmo -0.15*
(0.06)
died -0.22***
(0.05)
_cons 2.41***
(0.08)
lnalpha
_cons -0.57***
(0.07)
ll -4755.28
chi2 25.29
With a little bit of manual editing
we can produce an acceptable table of the output. I also manually added the likelihood ratio test
for alpha from the non-robust ztnb.
model
stay
age -0.02
(0.01)
hmo -0.15*
(0.06)
died -0.22***
(0.05)
constant 2.41***
(0.07)
likelihood ratio
test for
alpha 4307.04
log psuedo-
likelihood -6908.80
chi-squared 25.29
legend: coefficient/(standard error) *** p<0.001
The zero-truncated negative binomial regression model predicting length of hospital
stay from
age, hmo membership and death during the hospital stay
was statistically significant (chi-squared = 25.29, df = 3, p<.001).
The likelihood ratio test for alpha, the
over dispersion parameter, was significant (chi-squared = 4307.04, df = 1, p<.001)
indicating that the zero-truncated negative binomial model is preferred over a zero-truncated
poisson model.
The predictors
hmo and died were each statically significant. The effect of age
was not significant at the .05 level.
For these data the expected log count for those enrolled in an hmo was -0.15 that of
those not so enrolled. This amounts to a difference of about 1.25 days. Patients
who
died during the hospital stay had an expected log count difference of -0.22 or almost two days.
Cautions, Flies in the Ointment
See Also
Long, J. S. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications.
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California.