|
|
|
||||
|
|
|||||
Example 1. A study of length of hospital stay, in days, as a function of age, kind of health insurance and whether or not the patient died while in the hospital. Length of hospital stay is recorded as a minimum of at least one day.
Example 2. A study of the number of journal articles published by tenured faculty as a function of discipline (fine arts, science, social science, humanities, medical, etc). To get tenure faculty must publish, i.e., there are no tenured faculty with zero publications.
Example 3. A study by the county traffic court on the number of tickets received by teenagers as predicted by school performance, amount of driver training and gender. Only individuals who have received at least one citation are in the traffic court files.
We have a hypothetical data file, ztp.dta with 1,493 observations.
The length of hospital stay variable is stay
Let's look at the data.
Next comes the header information. On the right-hand side the number of
observations used (1493) is given along with the likelihood ratio chi-squared with three degrees of
freedom for the full model, followed by the p-value for the chi-square.
The model, as a whole, is statistically significant. The header also includes
a pseudo-R2 which is very low in this example (0.0129).
Below the header you will find the zero-truncated poisson coefficients for each of the variables
along with standard errors, z-scores, p-values and 95% confidence intervals for the
coefficients.
Now, just to be on the safe side, let's rerun the ztp command with the robust
option in order to obtain robust standard errors for the zero-truncated Poisson coefficients.
In the main body of the output are the zero-truncated poisson coefficients, robust standard errors,
z-scores, p-values and 95% confidence intervals for the coefficients. The variable age
was significant without the robust option and is not significant with it. The robust
standard errors attempt to adjust for heterogeneity in the model.
Zero-truncated poisson models can also display results as incidence rate ratios using the irr option.
UCLA Researchers are invited to our Statistical Consulting Servicesuse http://www.ats.ucla.edu/stat/stata/dae/ztp, clear
summarize stay
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
stay | 1493 9.728734 8.132908 1 74
histogram stay, discrete
tab1 age hmo died
-> tabulation of age
Age Group | Freq. Percent Cum.
------------+-----------------------------------
1 | 6 0.40 0.40
2 | 60 4.02 4.42
3 | 163 10.92 15.34
4 | 291 19.49 34.83
5 | 317 21.23 56.06
6 | 327 21.90 77.96
7 | 190 12.73 90.69
8 | 93 6.23 96.92
9 | 46 3.08 100.00
------------+-----------------------------------
Total | 1,493 100.00
-> tabulation of hmo
hmo | Freq. Percent Cum.
------------+-----------------------------------
0 | 1,254 83.99 83.99
1 | 239 16.01 100.00
------------+-----------------------------------
Total | 1,493 100.00
-> tabulation of died
died | Freq. Percent Cum.
------------+-----------------------------------
0 | 981 65.71 65.71
1 | 512 34.29 100.00
------------+-----------------------------------
Total | 1,493 100.00Some Strategies You Might Be Tempted To Try
Before we show how you can analyze this with a zero-truncated Poisson analysis, let's
consider some other methods that you might use.
Stata Zero-Truncated Poisson Analysis
ztp stay age hmo died
Iteration 0: log likelihood = -6908.7992
Iteration 1: log likelihood = -6908.7991
Zero-truncated Poisson regression Number of obs = 1493
LR chi2(3) = 181.13
Prob > chi2 = 0.0000
Log likelihood = -6908.7991 Pseudo R2 = 0.0129
------------------------------------------------------------------------------
stay | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.014442 .0050347 -2.87 0.004 -.0243099 -.0045742
hmo | -.1359033 .0237419 -5.72 0.000 -.1824365 -.0893701
died | -.2037709 .0183728 -11.09 0.000 -.239781 -.1677608
_cons | 2.435808 .0273324 89.12 0.000 2.382238 2.489379
------------------------------------------------------------------------------
The output looks very much like the output from an OLS regression. The output begins
the iteration log giving the values of the log likelihoods starting
with a model that has no predictors. The last value in the log is the final value
of the log likelihood for the full model and is repeated below.ztp stay age hmo died, robust
Iteration 0: log pseudolikelihood = -6908.7992
Iteration 1: log pseudolikelihood = -6908.7991
Zero-truncated Poisson regression Number of obs = 1493
Wald chi2(3) = 25.65
Prob > chi2 = 0.0000
Log pseudolikelihood = -6908.7991 Pseudo R2 = 0.0129
------------------------------------------------------------------------------
| Robust
stay | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.014442 .0121867 -1.19 0.236 -.0383276 .0094436
hmo | -.1359033 .0520484 -2.61 0.009 -.2379163 -.0338902
died | -.2037709 .0491608 -4.14 0.000 -.3001242 -.1074175
_cons | 2.435808 .0708745 34.37 0.000 2.296897 2.57472
------------------------------------------------------------------------------
Using the robust option has resulted in a fairly large change in the model chi-square,
which is now a Wald chi-square, based on log pseudolikelihoods, instead of a likelihood ratio
chi-square.ztp, irr
Zero-truncated Poisson regression Number of obs = 1493
Wald chi2(3) = 25.65
Prob > chi2 = 0.0000
Log pseudolikelihood = -6908.7991 Pseudo R2 = 0.0129
------------------------------------------------------------------------------
| Robust
stay | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .9856618 .012012 -1.19 0.236 .9623976 1.009488
hmo | .8729271 .0454345 -2.61 0.009 .7882687 .9666776
died | .8156492 .0400979 -4.14 0.000 .7407262 .8981506
------------------------------------------------------------------------------
Sample Write-Up of the Analysis
Before we begin the sample write-up we need to get the output into a form more acceptable for
publication. In order to go back to the non-exponentiated version of the coefficients
we will quietly rerun the ztp command.
The estout command (findit estout by Ben Jann of ETH Zurich), will get
us close to what we want.
quietly ztp
estout, cells(b(star fmt(%8.2f)) se(par fmt(%8.2f))) stats(ll chi2, fmt(%8.2f))
b/se
stay
age -0.01
(0.01)
hmo -0.14**
(0.05)
died -0.20***
(0.05)
_cons 2.44***
(0.07)
ll -6908.80
chi2 25.65
With a little bit of manual editing
we can produce an acceptable table of the output.
model
stay -0.01
(0.01)
hmo -0.14**
(0.05)
died -0.20***
(0.05)
constant 2.44***
(0.07)
log psuedo-
likelihood -6908.80
chi-squared 25.65
legend: coefficient/(standard error) *** p<0.001
The zero-truncated Poisson regression model predicting length of hospital stay from
age, hmo membership and death during the hospital stay
was statistically significant (chi-squared = 25.65, df = 3, p<.001).
The predictors
hmo and died were each statically significant. The effect of age
was not significant at the .05 level.
For these data the expected log count for those enrolled in an hmo was -0.14 that of
those not so enrolled. This amounts to a difference of about 1.25 days. Patients
who
died during the hospital stay had an expected log count difference of -0.20 or about 1.9 days.
Cautions, Flies in the Ointment
See Also
Long, J. S. 1997. Regression Models for Categorical and Limited Dependent Variables.
Thousand Oaks, CA: Sage Publications.
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California.