Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.
We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.csv. The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.
Let's look at the data.
Data poissonreg; infile "d:\work\data\raw\poissonreg.csv" delimiter="," firstobs=2; input id school male math langarts daysatt daysabs; run;
proc means data = poissonreg mean std min max var; var daysabs math langarts male; run;
The MEANS Procedure
Variable Mean Std Dev Minimum Maximum Variance ------------------------------------------------------------------------------ daysabs 5.8101266 7.4490028 0 45.0000000 55.4876432 math 48.7510115 17.8807562 1.0071140 98.9928900 319.7214429 langarts 50.0637938 17.9392106 1.0071140 98.9928900 321.8152757 male 0.4873418 0.5006325 0 1.0000000 0.2506329 ------------------------------------------------------------------------------
proc univariate data = poissonreg noprint; histogram daysabs / midpoints = 0 to 50 by 1 vscale = count ; run;
proc freq data = poissonreg; tables male; run;
The FREQ Procedure Cumulative Cumulative male Frequency Percent Frequency Percent --------------------------------------------------------- 0 162 51.27 162 51.27 1 154 48.73 316 100.00
proc genmod data = poissonreg; model daysabs = male math langarts /dist=negbin; run;
The GENMOD Procedure Model Information Data Set WORK.POISSONREG Distribution Negative Binomial Link Function Log Dependent Variable daysabs Number of Observations Read 316 Number of Observations Used 316 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 312 356.9348 1.1440 Scaled Deviance 312 356.9348 1.1440 Pearson Chi-Square 312 337.0888 1.0804 Scaled Pearson X2 312 337.0888 1.0804 Log Likelihood 2149.3649 Full Log Likelihood -880.8731 AIC (smaller is better) 1771.7462 AICC (smaller is better) 1771.9398 BIC (smaller is better) 1790.5250 Algorithm converged. Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 2.7161 0.2326 2.2602 3.1719 136.38 <.0001 male 1 -0.4312 0.1397 -0.7049 -0.1574 9.53 0.0020 math 1 -0.0016 0.0048 -0.0111 0.0079 0.11 0.7413 langarts 1 -0.0143 0.0056 -0.0253 -0.0034 6.61 0.0102 Dispersion 1 1.2884 0.1231 1.0471 1.5296 NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood.The output looks very much like the output from an OLS regression. The output begins with a summary of the dataset and model, followed by a list of various goodness of fit statistics. These are likelihood based. Below the fit statistics, you will find the negative binomial regression coefficients for each of the variables along with the corresponding standard errors, Wald 95% confidence intervals, Wald Chi-Square statistics, and p-values. After the coefficients for the predictors, there is an estimate for the Dispersion parameter. If the dispersion is 0, then a Poisson model be more appropriate to the data. Based on the 95% Confidence Limits for our dispersion parameter, we can say that dispersion is significantly different from 0 and we are justified in our negative binomial model.
Now, just to be on the safe side, let's rerun proc genmod with the repeated statement in order to obtain robust standard errors for the negative binomial regression coefficients.
proc genmod data = poissonreg; class id; model daysabs = male math langarts /dist=negbin; repeated subject=id /type=cs; run;
GEE Model Information Correlation Structure Exchangeable Subject Effect id (316 levels) Number of Clusters 316 Correlation Matrix Dimension 1 Maximum Cluster Size 1 Minimum Cluster Size 1 Algorithm converged. The GENMOD Procedure GEE Fit Criteria QIC -3969.2402 QICu -3970.7849 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > |Z| Intercept 2.7161 0.2323 2.2608 3.1714 11.69 <.0001 male -0.4312 0.1446 -0.7146 -0.1478 -2.98 0.0029 math -0.0016 0.0079 -0.0170 0.0138 -0.20 0.8388 langarts -0.0143 0.0053 -0.0248 -0.0039 -2.69 0.0071
The robust standard errors attempt to adjust for heterogeneity in the model. Using the robust standard errors has resulted in small changes in the standard errors and the z-tests still yield similar significant results.
The variable math is not significant with or without the repeated statement. Since math is not significant in the model with robust standard errors, we will rerun the model dropping that variable.
proc genmod data = poissonreg; model daysabs = male langarts /dist=negbin; run; The GENMOD Procedure Model Information Data Set WORK.POISSONREG Distribution Negative Binomial Link Function Log Dependent Variable daysabs Number of Observations Read 316 Number of Observations Used 316 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 313 356.9042 1.1403 Scaled Deviance 313 356.9042 1.1403 Pearson Chi-Square 313 334.4317 1.0685 Scaled Pearson X2 313 334.4317 1.0685 Log Likelihood 2149.3106 Full Log Likelihood -880.9274 AIC (smaller is better) 1769.8548 AICC (smaller is better) 1769.9834 BIC (smaller is better) 1784.8778 Algorithm converged. Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 2.7034 0.2293 2.2541 3.1528 139.03 <.0001 male 1 -0.4312 0.1397 -0.7050 -0.1574 9.53 0.0020 langarts 1 -0.0156 0.0039 -0.0234 -0.0079 15.71 <.0001 Dispersion 1 1.2891 0.1231 1.0478 1.5304 NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood.
This model fits the data significantly better than the null model, i.e., the intercept-only model. To show that this is the case, we can run the null model and compare the null model with the current model using chi-squared test on the difference of log likelihood.
proc genmod data = poissonreg;
model daysabs = / dist=negbin;
run;
quit;
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 315 356.9918 1.1333
Scaled Deviance 315 356.9918 1.1333
Pearson Chi-Square 315 329.9199 1.0474
Scaled Pearson X2 315 329.9199 1.0474
Log Likelihood 2138.9953
Full Log Likelihood -891.2427
AIC (smaller is better) 1786.4854
AICC (smaller is better) 1786.5238
BIC (smaller is better) 1793.9969
The log likelihood for the full model is -880.9274 and is -891.2427 for the null model. The chi-squared value is 2*( -880.9274 - -891.2427) = 20.6306. Since we have two predictor variables in the full model, the degrees of freedom for the chi-squared test is 2. This yields a p-value <.0001. Thus, our overall model is statistically significant.
Finally, we will use the estimate statement to get the predicted change in days absent for male and female group when the langarts is held at its mean.proc genmod data = poissonreg; class id ; model daysabs = male langarts /dist=negbin; repeated subject=id /type=cs; estimate "male" langarts 50.0637938 male 1 intercept 1 /exp; estimate "female" langarts 50.0637938 male 0 intercept 1 /exp; run;
Contrast Estimate Results Standard Chi- Label Estimate Error Alpha Confidence Limits Square Pr > ChiSq male 1.5032 0.0966 0.05 1.3138 1.6926 241.98 <.0001 Exp(male) 4.4960 0.4345 0.05 3.7202 5.4335 female 1.9125 0.0992 0.05 1.7182 2.1069 372.06 <.0001 Exp(female) 6.7703 0.6713 0.05 5.5745 8.2225
If you are using SAS version 9.2 or higher, you could run a negative binomial regression using proc countreg. This procedure allows a few more options specific to count outcomes than proc genmod. The proc countreg code for the original model run on this page appears below.
proc countreg data = poissonreg; model daysabs = male math langarts /dist=negbin (p=2); run;
In the negative binomial regression model predicting days absent from school stay with language arts and gender, our predictors langarts and male were each statically significant. For these data, the change in expected change in log count for a one-unit increase in language arts was -0.0156. Male students had an expected log count 0.4312 less than female students.
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.