Help the Stat Consulting Group by giving a gift

Poisson Regression

This page shows an example of Poisson regression analysis with footnotes
explaining the output in SPSS. The data collected were academic information on
316 students. The response variable is days absent during the school year (**daysabs**).
We explore its relationship with math standardized test scores (**mathnce**),
language standardized test scores (**langnce**) and gender (**female**).

As assumed for a Poisson model, our response variable is a count variable, and each subject has the same length of observation time. Had the observation time for subjects varied (i.e., some subjects were followed for half a year, some for a year and the rest for two years) and we were to neglect these differences in exposure time, our Poisson regression estimate would be biased since our model assumes all subjects had the same follow up time. Also, the Poisson model, as compared to other count models (i.e., negative binomial or zero-inflated models), is assumed the appropriate model. In other words, we assume that the response variable is not over-dispersed and does not have an excessive number of zeros.

The dataset can
be downloaded here.
**
**

get file 'D:\lahigh.sav'. recode gender (1=1) (else = 0) into female. exe.

In SPSS, Poisson models are treated as a subset of generalized linear models. This is reflected in the syntax. A generalized linear model is Poisson if the specified distribution is Poisson and the link function is log.

genlin daysabs with female mathnce langnce

/model female mathnce langnce distribution = poisson link = log

/print cps history solution fit.

a.** Included** - This is the number of observations from the dataset
included in the model. A observation is included if the outcome variable and
all predictor variables have valid, non-missing values.

b.** Excluded** - This is the number of observations from the dataset not
included in the model due to missing data in any of the outcome or predictor
variables.

c. **Total** - This is the sum of the included and excluded records. It
is equal to the total number of observations in the dataset.

d. **Iteration History** - This is a listing of the log likelihoods at each iteration. Remember Poisson
regression, like binary and ordered logistic regression, uses maximum likelihood
estimation, which is an iterative procedure. The first iteration (called
iteration 0) is the log likelihood of the "null" model. At each
iteration, the log likelihood increases because the goal is to maximize the log
likelihood.
When the difference between successive iterations is very small, the model is
said to have "converged", the iterating stops, and the results are displayed.
For more information on this process for binary outcomes, see
Regression Models for Categorical and Limited Dependent Variables by J.
Scott Long (page 52-61).

e. **Gradient Vector and Hessian Matrix** -
In our model, we are
estimating k+1 parameters where k is the number of predictors: one for each of
our predictors and one intercept parameter. The log likelihood of our model is
calculated based on these estimated parameters.
The gradient vector is the vector of partial
derivatives of the log likelihood function with respect to the estimated
parameters and the Hessian matrix is the square matrix of second
derivatives of this log likelihood with respect to the estimated parameters. The
variance-covariance matrix of the model parameters is the negative of the
inverse of the Hessian. The values in the Hessian can suggest convergence
problems in the model, but the iteration history and possible error messages
provided by SPSS are more useful tools in diagnosing problems with the model.

f.** Deviance** - Deviance is usually defined as the log likelihood of the
final model, multiplied by (-2). However, for Poisson regression, SPSS
calculates the deviance as

Note that the log likelihood of the model is -1547.971. The usual formulation of the deviance would yield (-2)(-1547.971) = 3095.942, which is greater than the deviance calculated using the above formula.

g. **Pearson Chi-Square - **This is a goodness-of-fit measure that
compares the predicted values of the outcome variable with the actual values.
It is calculated as

There is no scaling in this model, so we see that the Scaled Pearson Chi-Square is equal to the Pearson Chi-Square.

h**. Log Likelihood - **This is the log likelihood of the final model.

i.** AIC** - This is the Akaike information criterion, a goodness-of-fit
measure defined as (*-*2 ln *L* + 2*k*) where *k* is the
number of parameters in the model and *L* is the likelihood function of the
final model.

j.** BIC** - This is the Bayesian information criterion, a goodness of fit
measure defined as

where *n* is the total number of observations, *k*
is the number of model parameters, and *L* is the likelihood function of
the final model.

k. **B** - These are the estimated Poisson regression coefficients for the
model. Recall that the response variable is a count variable, and Poisson
regression models the log of the expected count as a function of the predictor
variables. We can interpret the Poisson regression coefficient as follows: for a
one unit change in the predictor variable, the difference in the logs of
expected counts is expected to change by the respective regression coefficient,
given the other predictor variables in the model are held constant.

** (Intercept)** - This is the Poisson regression estimate when all
variables in the model are evaluated at zero. For males (the variable **female**
evaluated at zero) with zero **mathnce** and **langnce** test scores, the
log of the expected count for **daysabs** is 2.287 units. Note that
evaluating **mathnce** and **langnce** at zero is out of the range of
plausible test scores. If the test scores were mean-centered, the intercept
would have a natural interpretation: the log of the expected count for males
with average **mathnce** and **langnce** test scores.

**female** - This is the estimated
Poisson regression coefficient
comparing females to males, given the other variables are held constant in the
model. The difference in the logs of expected counts is expected to be 0.401
unit higher for females compared to males, while holding the other variables
constant in the model. So if we consider two students, one male and one female,
with identical math and language test scores, the female student will have a
higher predicted value of* log(# days absent) *than the male student.
Thus, we would expect the female student to have more days absent than her male
counterpart.

** mathnce** - This is the Poisson regression estimate for a one unit
increase in math standardized test score, given the other variables are held
constant in the model. If a student were to increase her **mathnce** test
score by one point, the difference in the logs of expected counts would be
expected to decrease by 0.004 unit, while holding the other variables in the
model constant. If we consider two students of the same sex who have the same
language score, we would expect the student with the higher math score of the
two to have fewer days absent than the other student.

**langnce** - This is the Poisson regression estimate for a one unit
increase in language standardized test score, given the other variables are held
constant in the model. If a student were to increase her **langnce** test
score by one point, the difference in the logs of expected counts would be
expected to decrease by 0.012 unit while holding the other variables in the
model constant. If we consider two students of the same sex who have the same
math score, we would expect the student with the higher language score of the
two to have fewer days absent than the other student.

l. **Std. Error** - These are the standard errors of the individual
regression coefficients. They are used both in the calculation of the **Wald
Chi-Square** test statistic, superscript l, and the confidence interval of the
regression coefficient, superscript k.

m. **95% Wald Confidence Interval** - This is the confidence interval (CI)
of an individual Poisson regression coefficient, given the other predictors are
in the model. For a given predictor variable with a level of 95% confidence,
we'd say that we are 95% confident that upon repeated trials 95% of the CI's
would include the "true" population Poisson regression coefficient. It is
calculated as **B** ± (z_{α/2})*(**Std.Error**), where z_{α/2}
is a critical value on the standard normal distribution. The CI is equivalent to
the z test statistic: if the CI includes zero, we'd fail to reject the null
hypothesis that a particular regression coefficient is zero, given the other
predictors are in the model. An advantage of a CI is that it is illustrative; it
provides information on where the "true" parameter may lie and the precision of
the point estimate.

n. **Wald Chi-Square** - These are the test statistics for the individual
regression coefficients. The test statistic is the squared ratio of the
coefficient** B** to the **Std. Error** of the respective predictor. The
test statistic follows a Chi-Square distribution which is used to test against a
two-sided alternative hypothesis that the **B** is not equal to zero.

o. **df **- This column lists the degrees of freedom for each of the
variables included in the model. For each of these variables, the degree of
freedom is 1.

p. **Sig.** - These are the p-values of the coefficients or the
probability that, within a given model, the null hypothesis that a particular
predictor's regression coefficient is zero given that the rest of the predictors
are in the model. They are based on the **Wald Chi-Square** test statistics
of the predictors. The probability that a particular **Wald** test statistic
is as extreme as, or more so, than what has been observed under the null
hypothesis is defined by the p-value and presented here. By looking at the
estimates of the standard errors to a greater degree of precision, we can
calculate the test statistics and see that they match those produced in SPSS. To
view the estimates with more decimal places displayed, click on the Parameter
Estimates table in your SPSS output, then double-click on the number of
interest.

** (Intercept)** - The **Wald Chi-Square** test statistic testing **
(Intercept)** is zero, given the other variables are in the model and
evaluated at zero, is (2.286745/ 0.0699539)^{2 }= 1068.590, with an
associated p-value of <0.0001. If we set our alpha level at 0.05, we would
reject the null hypothesis and conclude that **(Intercept)** on **daysabs**
has been found to be statistically different from zero given **mathnce**, **
langnce **and **female** are in the model and evaluated at zero.

**female** - The **Wald Chi-Square** test statistic testing the
difference between the log of expected counts between males and females on **
daysabs **is zero, given the other variables are in the model, is
(0.4009209/0.0484122)^{2}= 68.582, with an associated p-value of
<0.0001. If we set our alpha level at 0.05, we would reject the null hypothesis
and conclude that the coefficient for **female** is statistically different
from zero given **mathnce **and **langnce** are in the model.

** mathnce** - The **Wald Chi-Square** test statistic testing the
slope for **mathnce** on** daysabs **is zero, given the other variables
are in the model, is (-0.0035232/0.0018213)^{2} = 3.742, with an
associated p-value of 0.053. If we set our alpha level at 0.05, we would fail
to reject the null hypothesis and conclude the poisson regression coefficient
for **mathnce** is not statistically different from zero given **langnce **
and **female** are in the model.

**langnce** - The **Wald Chi-Square** test statistic testing the
slope for **langnce** on **daysabs **is zero, given the other variables
are in the model, is (-0.0121521/0.0018348)^{2} = 43.865, with an
associated p-value of <0.0001. If we set our alpha level at 0.05, we would
reject the null hypothesis and conclude the poisson regression coefficient for
**langnce** is statistically different from zero given **mathnce **and **
female** are in the model.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.