Statistical Consulting Group
UCLA Academic Technology Services
April 2007
About This Presentation
Regression Models with Count Data Outline
Poisson Distribution
The poisson probability distribution.
Likelihood function for the poisson model.


Some History of the Poisson Distribution

There is a famous example from a 1898 book by Ladislaus von Bortkiewicz showing that the
number of soldiers killed by mule-kicks each year in the Prussian cavalry followed
a poisson distribution. The title of the book was The Law of Small Numbers.

Poisson Examples

As a comparison, here is a normal distribution with the same mean and variance as the poisson distribution above.



One formulation of the negative binomial distribution can be used to model count
data with overdispersion.

Likelihood function for the negative binomial model.


Negative Binomial Examples



Maximum Likelihood Estimation
Count models are estimated using maximum likelihood
Log likelihood:
Number of goals scored for each game in a season (mean= 2.16 variance= 3.00)




Count models need some sort of mechanism to deal with the fact that counts can be made over different observation periods. For example, the number of accidents are recorded for 50 different intersections. However, the number of vehicles that pass through the intersections can vary greatly. Fifteen accidents for 30,000 vehicles is very different from 15 accidents for 1,500 vehicles. Count models account for these differences by including the log of the exposure variable in model with coefficient constrained to be one.
The use of exposure is superior in many instances to analyzing rates as response variables because it makes use of the correct probability distributions. It should be noted that exposure is used to adjust counts on the response variable and that it is possible to various kinds of rates, indexes or per capita measures as predictors.
The response variable is the number of deaths recorded at each of five different age-group and two smoker categories. The difference in the number of patient years will be accounted for with an exposure variable pyears. Below, note that rows 1 and 10 have almost identical numbers of deaths but have very different values for patient years.
agecat smokes deaths pyears
1 1 32 52,407
2 1 104 43,248
3 1 206 28,612
4 1 186 12,663
5 1 102 5,317
1 0 2 18,790
2 0 12 10,673
3 0 28 5,710
4 0 28 2,585
5 0 31 1,462
variable | N mean variance min max
-------------+--------------------------------------------------
deaths | 10 73.1 5390.767 2 206
pyears | 10 18146.7 3.16e+08 1462 52407
----------------------------------------------------------------
The predictor variables are four age-group dummy variables and a dummy variable to indicate smokers.
These data can be analyzed with either a poisson regression model or a negative binomial regression model.
ll df BIC model -33.60015 6 81.0158 poisson -33.60014 7 83.3184 negative binomial
There is not much difference between the two models based on the log-likelihood and the BIC but the poisson model has a slightly better BIC.
Poisson regression Number of obs = 10
LR chi2(5) = 922.93
Prob > chi2 = 0.0000
Log likelihood = -33.600153 Pseudo R2 = 0.9321
------------------------------------------------------------------------------
deaths | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smokes | .3545356 .1073741 3.30 0.001 .1440862 .564985
a2 | 1.484007 .1951034 7.61 0.000 1.101611 1.866403
a3 | 2.627505 .1837273 14.30 0.000 2.267406 2.987604
a4 | 3.350493 .1847992 18.13 0.000 2.988293 3.712693
a5 | 3.700096 .1922195 19.25 0.000 3.323353 4.07684
_cons | -7.919326 .1917618 -41.30 0.000 -8.295172 -7.543479
pyears | (exposure)
------------------------------------------------------------------------------
Zero-inflated models attempt to account for excess zeros, i.e., there is thought to be two kinds of zeros, "true zeros" and excess zeros. Zero-inflated models estimate two equations, one for the count model and one for the excess zero's.
Somehow, it seems very appropriate to discuss fish and poisson in the same example.
Response variable:
count -- the number of fish caught by visitors to a state park
Predictors:
child -- number of children in group
camper -- camping one or more nights during stay (0/1)
persons -- number of people in group
Child is also used as the predictor of the inflated-zeros
In our example "true zeros" are obtained from people who fish but do not catch anything. The excess zeros come from those who do not fish at all. Separate equations are used to predict each kind of zero.

We will look at two count models and two zero-inflated models and an OLS regression for comparison.
ll df BIC model -749.3503 4 1520.754 ols -645.2568 4 1312.567 poisson -391.0271 5 809.621 negative binomial -561.5176 6 1156.116 zero-inflated poisson -384.8586 7 808.311 zero-inflated negative binomial
The results for the zero-inflated negative binomial are given below. Note this model is only marginally better than the ordinary negative binomial model and the Vuong test is not significant but the data were clearly generated by a process that leads to inflated zeros.
Zero-inflated negative binomial regression Number of obs = 248
Nonzero obs = 106
Zero obs = 142
Inflation model = logit LR chi2(3) = 69.88
Log likelihood = -384.8586 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
count |
child | -1.017093 .2734393 -3.72 0.000 -1.553025 -.4811623
camper | .4953317 .2320177 2.13 0.033 .0405854 .950078
persons | .9412857 .1142745 8.24 0.000 .7173118 1.16526
_cons | -1.440209 .3237543 -4.45 0.000 -2.074756 -.8056624
-------------+----------------------------------------------------------------
inflate |
child | 2.857134 .7912044 3.61 0.000 1.306402 4.407866
_cons | -4.155515 1.358571 -3.06 0.002 -6.818266 -1.492765
-------------+----------------------------------------------------------------
/lnalpha | .5080559 .1945664 2.61 0.009 .1267128 .889399
-------------+----------------------------------------------------------------
alpha | 1.662057 .3233804 1.135091 2.433667
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 353.32 Pr>=chibar2 = 0.0000
Vuong test of zinb vs. standard negative binomial: z = 1.29 Pr>z = 0.0981
OLS regression
Source | SS df MS Number of obs = 248
-------------+------------------------------ F( 3, 244) = 31.67
Model | 2381.37932 3 793.793107 Prob > F = 0.0000
Residual | 6116.21745 244 25.066465 R-squared = 0.2802
-------------+------------------------------ Adj R-squared = 0.2714
Total | 8497.59677 247 34.4032258 Root MSE = 5.0066
------------------------------------------------------------------------------
count | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
child | -3.66316 .4517096 -8.11 0.000 -4.552908 -2.773412
camper | 1.803715 .6462559 2.79 0.006 .5307624 3.076667
persons | 2.958617 .3471834 8.52 0.000 2.274758 3.642476
_cons | -3.513368 .911917 -3.85 0.000 -5.309602 -1.717134
------------------------------------------------------------------------------
Poisson regression
Poisson regression Number of obs = 248
LR chi2(3) = 825.76
Prob > chi2 = 0.0000
Log likelihood = -645.25676 Pseudo R2 = 0.3902
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
child | -1.435991 .0817147 -17.57 0.000 -1.596149 -1.275834
camper | .6581591 .0928267 7.09 0.000 .4762221 .840096
persons | .8921543 .0412052 21.65 0.000 .8113936 .972915
_cons | -1.406989 .1512168 -9.30 0.000 -1.703369 -1.11061
------------------------------------------------------------------------------
Negative binomial regression
Negative binomial regression Number of obs = 248
LR chi2(3) = 94.97
Dispersion = mean Prob > chi2 = 0.0000
Log likelihood = -391.02712 Pseudo R2 = 0.1083
------------------------------------------------------------------------------
count | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
child | -1.663023 .1952636 -8.52 0.000 -2.045732 -1.280313
camper | .5581623 .2323451 2.40 0.016 .1027743 1.01355
persons | .9721464 .1223069 7.95 0.000 .7324293 1.211864
_cons | -1.462463 .334303 -4.37 0.000 -2.117685 -.8072411
-------------+----------------------------------------------------------------
/lnalpha | .7456885 .1585742 .4348888 1.056488
-------------+----------------------------------------------------------------
alpha | 2.107892 .3342573 1.544791 2.876253
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 508.46 Prob>=chibar2 = 0.000

Days absent from school during one school year (mean = 5.81 variance = 55.49)
Predictors of the count variable are female, school and reading.Several variables were tried as predictors of excess zeros for zero-inflated models. The large number of zeros in the data might seem to suggest some type of zero-inflated model. However, the zero-inflated models were not used for both empirical reasons and because there did not seem to be a reasonable way explain what kind of a process would generate excess zeros in days absent.

Histogram of days absent by school.

We will look at five different models, two count models, two zero-inflated count model and an ols regression thrown in for good measure.
ll df BIC model -1060.365 4 2143.75 ols regression -1435.846 3 2894.72 poisson -867.240 4 1763.26 negative binomial -1278.182 5 2590.90 zero-inflated poisson -867.200 6 1774.69 zero-inflated negative binomial
Both the negative binomial and the zero-inflated negative binomial are very close in log likelihoods and BIC's, there's a slight edge to the straight negative binomial due to having fewer degrees of freedom.
Negative binomial regression Number of obs = 316
LR chi2(2) = 44.02
Dispersion = mean Prob > chi2 = 0.0000
Log likelihood = -869.23382 Pseudo R2 = 0.0247
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .3366034 .132156 2.55 0.011 .0775823 .5956244
school | -.8628592 .1323017 -6.52 0.000 -1.122166 -.6035526
_cons | 2.776001 .2142417 12.96 0.000 2.356094 3.195907
-------------+----------------------------------------------------------------
/lnalpha | .1568369 .0986426 -.036499 .3501728
-------------+----------------------------------------------------------------
alpha | 1.169805 .1153925 .9641591 1.419313
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 1152.21 Prob>=chibar2 = 0.000
There are a number of interesting situations in which the count variable cannot take on the value zero.
Take for example the response variable length of hospital stay (mean = 5.37 variance = 49.24). The predictor variables are age, hmo and died (died before discharge). Note that there are no zero counts in the data.


Note that both poisson and negative binomial predict a probability for zero length of hospital stay. The negative binomial provides a closer fit to the observed than does the poisson.
ll df BIC model -6908.799 4 13846.83 zero-truncated poisson -4755.28 5 9547.10 zero-truncated negative binomial
The results for the zero-truncated negative binomial are given below.
Zero-truncated negative binomial regression Number of obs = 1493
LR chi2(3) = 31.14
Dispersion = mean Prob > chi2 = 0.0000
Log likelihood = -4755.2796 Pseudo R2 = 0.0033
------------------------------------------------------------------------------
stay | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0156929 .013107 -1.20 0.231 -.0413822 .0099964
hmo | -.1470576 .0592161 -2.48 0.013 -.263119 -.0309962
died | -.2177714 .0461605 -4.72 0.000 -.3082442 -.1272985
_cons | 2.408328 .071982 33.46 0.000 2.267245 2.54941
-------------+----------------------------------------------------------------
/lnalpha | -.5686389 .0551506 -.6767321 -.4605457
-------------+----------------------------------------------------------------
alpha | .5662957 .0312316 .5082753 .6309393
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 4307.04 Prob>=chibar2 = 0.000A hurdle model is a modified count model in which there are two processes, one generating the zeros and one generating the positive values. The two models are not constrained to be the same. The concept underlying the hurdle model is that a binomial probability model governs the binary outcome of whether a count variable has a zero or a positive value. If the value is positive, the "hurdle is crossed," and the conditional distribution of the positive values is governed by a zero-truncated count model.
We will illustrate this model using artificially generated data with response variable y and predictors x1 x2.
y | Freq.
------------+---------------
0 | 315
1 | 123
2 | 138
3 | 117
4 | 88
5 | 60
6 | 35
7 | 23
8 | 24
9 | 19
10 | 16
11 | 11
12 | 10
13 | 5
14 | 1
15 | 4
16 | 1
19 | 2
21 | 2
22 | 2
26 | 1
32 | 2
34 | 1

Since the count part of the model was generated using a random poisson generator, we will look at three different poisson based models.
ll df BIC model -2133.079 3 4286.881 poisson -1748.863 6 3539.173 zero-inflated poisson -1730.438 6 3502.322 poisson-logit hurdle
The poisson-logit hurdle model is clearly the best choice here. The results for this model are given below
Poisson-Logit Hurdle Regression Number of obs = 1000
Wald chi2(2) = 187.13
Log likelihood = -1730.4377 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
logit |
x1 | 1.494141 .1092238 13.68 0.000 1.280067 1.708216
x2 | .0637035 .086461 0.74 0.461 -.105757 .233164
_cons | .9951839 .0855003 11.64 0.000 .8276064 1.162761
-------------+----------------------------------------------------------------
poisson |
x1 | .1732047 .0226329 7.65 0.000 .128845 .2175644
x2 | .7216143 .0198055 36.44 0.000 .6827963 .7604324
_cons | 1.009172 .0284237 35.50 0.000 .9534629 1.064882
------------------------------------------------------------------------------

We have a number of options for analyzing these data. We can use standard poisson with cluster robust standard errors to account for within school association. Or, we can use one of the random-effects models for poisson or negative binomial. These are random intercept only models.
ll df BIC model -2938.653 3 5896.653 poisson with cluster robust standard errors -1718.321 4 3462.437 negative binomial with cluster robust standard errors -2718.767 4 5463.33 random-effects poisson w/ gaussian random effects -2718.864 4 5463.523 random-effects poisson w/ gamma random effects -1696.797 5 3425.838 random-effects negative binomial w/ beta random effectsThe random-effects negative binomial has the best log likelihood and BIC. The results of the model are given below.
Random-effects negative binomial regression Number of obs = 632
Group variable (i): sid Number of groups = 12
Random effects u_i ~ Beta Obs per group: min = 51
avg = 52.7
max = 53
Wald chi2(2) = 36.14
Log likelihood = -1696.7966 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .3587537 .0788313 4.55 0.000 .2042472 .5132602
reading | -.0102389 .0022909 -4.47 0.000 -.0147289 -.0057489
_cons | .1944149 .131354 1.48 0.139 -.0630343 .451864
-------------+----------------------------------------------------------------
/ln_r | 2.260673 .4859689 1.308192 3.213155
/ln_s | 3.922436 .5105199 2.921835 4.923037
-------------+----------------------------------------------------------------
r | 9.589543 4.66022 3.699478 24.85738
s | 50.52336 25.79318 18.57535 137.4193
------------------------------------------------------------------------------
Likelihood-ratio test vs. pooled: chibar2(01) = 32.25 Prob>=chibar2 = 0.000

Summary of Count Regression Models
poisson regression with exposure, deaths
(also see DAE page): . poisson deaths smokes a2 a3 a4 a5, exposure(pyears) zero-inflated negative binomial, fish
(also see DAE page): . zinb count child camper persons, inflate(child) vuong zip negative binomial regression, days absent
(also see DAE page): . nbreg daysabs female school Zero-truncated negative binomial regression, length of hosptial stay
(also see DAE page): . ztnb stay age hmo died Poisson-Logit Hurdle Regression, artifical data . hplogit y x1 x2 negative binomial regression with cluster robust standard errors, days absent: . nbreg daysabs female reading, cluster(sid) random effects negative binomial, days absent: . xtnbreg daysabs female reading, i(sid)
Here are some places to read more about regression models with count data.
Agresti, A. (2001) Categorical Data Analysis (2nd ed). New York: Wiley.
Agresti, A. (1996) An Introduction to Categorical Data Analysis. New York: Wiley.
Long, S. J. (1997) Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: SAGE Publications, Inc.
Long, S. J. & Freese, J. (2001) Regression Models for Categorical Dependent Variables using Stata. College Station, TX: Stata Press.