UCLA Academic Technology Services HomeServicesClassesContactJobs

SAS Data Analysis Examples
Poisson Regression

Examples of Poisson Regression

Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.

Description of the Data

Let's pursue Example 1 from above.

We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.csv The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.

Let's look at the data.

Data poissonreg;
  infile "d:\work\data\raw\poissonreg.csv" delimiter="," firstobs=2;
  input id school male math langarts daysatt daysabs;
run;

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a Poisson regression analysis, let's consider some other methods that you might use.

SAS Poisson Regression Analysis

The output looks somewhat like the output from an OLS regression. The output begins with a summary of the dataset and model, followed by a list of various goodness of fit statistics. These are likelihood based.  Below the fit statistics, you will find the negative binomial regression coefficients for each of the variables along with the corresponding standard errors, Wald 95% confidence intervals, Wald Chi-Square statistics, and p-values. 

Now, just to be on the safe side, let's rerun proc genmod with the repeated statement in order to obtain robust standard errors for the Poisson regression coefficients.

The robust standard errors attempt to adjust for heterogeneity in the model. Using the robust standard errors has resulted in a fairly large change in the standard error, which should be more appropriate. The z-tests still yield similar significant results, but give more realistic p-values.

In the main body of the output are the poisson coefficients, robust standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. The variable math was border-line significant without the repeated statement and is clearly not significant with it.

Since math is not significant in the model with robust standard errors, we will rerun the model dropping that variable.

This model fits the data significantly better than the null model, i.e., the intercept-only model. To show that this is the case, we can run the null model and compare the null model with the current model using chi-squared test on the difference of log likelihood.

proc genmod data = poissonreg;
  class id;
  model daysabs = / type3 dist=poisson;
  repeated  subject=id  /type=cs; 
  run;
quit;
                   Criteria For Assessing Goodness Of Fit

                   Criterion                 DF           Value        Value/DF

                   Deviance                 315       2409.8204          7.6502
                   Scaled Deviance          315       2409.8204          7.6502
                   Pearson Chi-Square       315       3008.3006          9.5502
                   Scaled Pearson X2        315       3008.3006          9.5502
                   Log Likelihood                     1394.6299

The log likelihood for the full model is  1480.3813 and is 1394.6299 for the null model. The chi-squared value is 2*( 1480.3813 - 1394.6299) = 171.5028. Since we have two predictor variables in the full model, the degrees of freedom for the chi-squared test is 2. This yields a p-value <.0001.

Finally, we will use the estimate statement to get the predicted change in days absent for male and female group when the langarts is held at its mean.

Sample Write-Up of the Analysis

The Poisson regression model predicting days absent from school stay from language arts and gender was statistically significant with likelihood ratio chi-square = 171.503, df=2 yielding p-value <.0001. The predictors langarts and male were each statically significant. For these data, the expected change in log count for a one-unit increase in language arts was -0.0146. Male students had an expected log count 0.41 less than female students.

Cautions, Flies in the Ointment

  • It is not recommended that Poisson models be applied to small samples. What constitutes a small sample does not seem to be clearly defined in the literature.
  • See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.