|
|
|
||||
|
|
|||||
Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.
We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.csv. The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.
Let's look at the data.
Data poissonreg; infile "d:\work\data\raw\poissonreg.csv" delimiter="," firstobs=2; input id school male math langarts daysatt daysabs; run;
proc means data = poissonreg mean std min max var; var daysabs math langarts male; run;
The MEANS Procedure
Variable Mean Std Dev Minimum Maximum Variance ------------------------------------------------------------------------------ daysabs 5.8101266 7.4490028 0 45.0000000 55.4876432 math 48.7510115 17.8807562 1.0071140 98.9928900 319.7214429 langarts 50.0637938 17.9392106 1.0071140 98.9928900 321.8152757 male 0.4873418 0.5006325 0 1.0000000 0.2506329 ------------------------------------------------------------------------------
proc univariate data = poissonreg noprint; histogram daysabs / midpoints = 0 to 50 by 1 vscale = count ; run;
proc freq data = poissonreg; tables male; run;
The FREQ Procedure
Cumulative Cumulative
male Frequency Percent Frequency Percent
---------------------------------------------------------
0 162 51.27 162 51.27
1 154 48.73 316 100.00
proc genmod data = poissonreg; model daysabs = male math langarts /dist=negbin; run;
The GENMOD Procedure
Model Information
Data Set WORK.POISSONREG
Distribution Negative Binomial
Link Function Log
Dependent Variable daysabs
Number of Observations Read 316
Number of Observations Used 316
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 312 356.9348 1.1440
Scaled Deviance 312 356.9348 1.1440
Pearson Chi-Square 312 337.0888 1.0804
Scaled Pearson X2 312 337.0888 1.0804
Log Likelihood 2149.3649
Full Log Likelihood -880.8731
AIC (smaller is better) 1771.7462
AICC (smaller is better) 1771.9398
BIC (smaller is better) 1790.5250
Algorithm converged.
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq
Intercept 1 2.7161 0.2326 2.2602 3.1719 136.38 <.0001
male 1 -0.4312 0.1397 -0.7049 -0.1574 9.53 0.0020
math 1 -0.0016 0.0048 -0.0111 0.0079 0.11 0.7413
langarts 1 -0.0143 0.0056 -0.0253 -0.0034 6.61 0.0102
Dispersion 1 1.2884 0.1231 1.0471 1.5296
NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood.
The output looks very much like the output from an OLS regression. The output begins
with a summary of the dataset and model, followed by a list of various goodness
of fit statistics. These are likelihood based.
Below the fit statistics, you will find the negative binomial regression coefficients for each of the variables
along with the corresponding standard errors, Wald 95% confidence intervals, Wald Chi-Square
statistics, and p-values. After the coefficients for the predictors, there
is an estimate for the Dispersion parameter. If the dispersion is
0, then a Poisson model be more appropriate to the data. Based on the 95%
Confidence Limits for our dispersion parameter, we can say that dispersion is
significantly different from 0 and we are justified in our negative binomial
model. Now, just to be on the safe side, let's rerun proc genmod with the repeated statement in order to obtain robust standard errors for the negative binomial regression coefficients.
proc genmod data = poissonreg; class id; model daysabs = male math langarts /dist=negbin; repeated subject=id /type=cs; run;
GEE Model Information
Correlation Structure Exchangeable
Subject Effect id (316 levels)
Number of Clusters 316
Correlation Matrix Dimension 1
Maximum Cluster Size 1
Minimum Cluster Size 1
Algorithm converged.
The GENMOD Procedure
GEE Fit Criteria
QIC -3969.2402
QICu -3970.7849
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z|
Intercept 2.7161 0.2323 2.2608 3.1714 11.69 <.0001
male -0.4312 0.1446 -0.7146 -0.1478 -2.98 0.0029
math -0.0016 0.0079 -0.0170 0.0138 -0.20 0.8388
langarts -0.0143 0.0053 -0.0248 -0.0039 -2.69 0.0071
The robust standard errors attempt to adjust for heterogeneity in the model. Using the robust standard errors has resulted in small changes in the standard errors and the z-tests still yield similar significant results.
The variable math is not significant with or without the repeated statement. Since math is not significant in the model with robust standard errors, we will rerun the model dropping that variable.
proc genmod data = poissonreg;
model daysabs = male langarts /dist=negbin;
run;
The GENMOD Procedure
Model Information
Data Set WORK.POISSONREG
Distribution Negative Binomial
Link Function Log
Dependent Variable daysabs
Number of Observations Read 316
Number of Observations Used 316
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 313 356.9042 1.1403
Scaled Deviance 313 356.9042 1.1403
Pearson Chi-Square 313 334.4317 1.0685
Scaled Pearson X2 313 334.4317 1.0685
Log Likelihood 2149.3106
Full Log Likelihood -880.9274
AIC (smaller is better) 1769.8548
AICC (smaller is better) 1769.9834
BIC (smaller is better) 1784.8778
Algorithm converged.
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence Wald
Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq
Intercept 1 2.7034 0.2293 2.2541 3.1528 139.03 <.0001
male 1 -0.4312 0.1397 -0.7050 -0.1574 9.53 0.0020
langarts 1 -0.0156 0.0039 -0.0234 -0.0079 15.71 <.0001
Dispersion 1 1.2891 0.1231 1.0478 1.5304
NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood.
This model fits the data significantly better than the null model, i.e., the intercept-only model. To show that this is the case, we can run the null model and compare the null model with the current model using chi-squared test on the difference of log likelihood.
proc genmod data = poissonreg;
model daysabs = / dist=negbin;
run;
quit;
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 315 356.9918 1.1333
Scaled Deviance 315 356.9918 1.1333
Pearson Chi-Square 315 329.9199 1.0474
Scaled Pearson X2 315 329.9199 1.0474
Log Likelihood 2138.9953
Full Log Likelihood -891.2427
AIC (smaller is better) 1786.4854
AICC (smaller is better) 1786.5238
BIC (smaller is better) 1793.9969
The log likelihood for the full model is -880.9274 and is -891.2427 for the null model. The chi-squared value is 2*( -880.9274 - -891.2427) = 20.6306. Since we have two predictor variables in the full model, the degrees of freedom for the chi-squared test is 2. This yields a p-value <.0001. Thus, our overall model is statistically significant.
Finally, we will use the estimate statement to get the predicted change in days absent for male and female group when the langarts is held at its mean.proc genmod data = poissonreg; class id ; model daysabs = male langarts /dist=negbin; repeated subject=id /type=cs; estimate "male" langarts 50.0637938 male 1 intercept 1 /exp; estimate "female" langarts 50.0637938 male 0 intercept 1 /exp; run;
Contrast Estimate Results
Standard Chi-
Label Estimate Error Alpha Confidence Limits Square Pr > ChiSq
male 1.5032 0.0966 0.05 1.3138 1.6926 241.98 <.0001
Exp(male) 4.4960 0.4345 0.05 3.7202 5.4335
female 1.9125 0.0992 0.05 1.7182 2.1069 372.06 <.0001
Exp(female) 6.7703 0.6713 0.05 5.5745 8.2225
If you are using SAS version 9.2 or higher, you could run a negative binomial regression using proc countreg. This procedure allows a few more options specific to count outcomes than proc genmod. The proc countreg code for the original model run on this page appears below.
proc countreg data = poissonreg; model daysabs = male math langarts /dist=negbin (p=2); run;
In the negative binomial regression model predicting days absent from school stay with language arts and gender, our predictors langarts and male were each statically significant. For these data, the change in expected change in log count for a one-unit increase in language arts was -0.0156. Male students had an expected log count 0.4312 less than female students.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services