|
|
|
||||
|
|
|||||
Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.
We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.csv The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.
Let's look at the data.
Data poissonreg; infile "d:\work\data\raw\poissonreg.csv" delimiter="," firstobs=2; input id school male math langarts daysatt daysabs; run;
proc means data = poissonreg mean std min max var; var daysabs math langarts male; run;
The MEANS Procedure
Variable Mean Std Dev Minimum Maximum Variance ------------------------------------------------------------------------------ daysabs 5.8101266 7.4490028 0 45.0000000 55.4876432 math 48.7510115 17.8807562 1.0071140 98.9928900 319.7214429 langarts 50.0637938 17.9392106 1.0071140 98.9928900 321.8152757 male 0.4873418 0.5006325 0 1.0000000 0.2506329 ------------------------------------------------------------------------------
proc univariate data = poissonreg noprint; histogram daysabs / midpoints = 0 to 50 by 1 vscale = count ; run;
proc freq data = poissonreg; tables male; run;
The FREQ Procedure
Cumulative Cumulative
male Frequency Percent Frequency Percent
---------------------------------------------------------
0 162 51.27 162 51.27
1 154 48.73 316 100.00
proc genmod data = poissonreg; model daysabs = male math langarts /dist=poisson; run;
The GENMOD Procedure
Model Information
Data Set WORK.POISSONREG
Distribution Poisson
Link Function Log
Dependent Variable daysabs
Number of Observations Read 316
Number of Observations Used 316
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 312 2234.5462 7.1620
Scaled Deviance 312 2234.5462 7.1620
Pearson Chi-Square 312 2774.4139 8.8924
Scaled Pearson X2 312 2774.4139 8.8924
Log Likelihood 1482.2670
Algorithm converged.
Analysis Of Parameter Estimates
Standard Wald 95% Chi-
Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq
Intercept 1 2.6877 0.0727 2.5453 2.8301 1368.56 <.0001
male 1 -0.4009 0.0484 -0.4958 -0.3060 68.58 <.0001
math 1 -0.0035 0.0018 -0.0071 0.0000 3.74 0.0531
langarts 1 -0.0122 0.0018 -0.0157 -0.0086 43.86 <.0001
Scale 0 1.0000 0.0000 1.0000 1.0000
The output looks somewhat like the output from an OLS regression. The output begins with a summary of the dataset and model, followed by a list of various goodness of fit statistics. These are likelihood based. Below the fit statistics, you will find the negative binomial regression coefficients for each of the variables along with the corresponding standard errors, Wald 95% confidence intervals, Wald Chi-Square statistics, and p-values.
Now, just to be on the safe side, let's rerun proc genmod with the repeated statement in order to obtain robust standard errors for the Poisson regression coefficients.
proc genmod data = poissonreg; class id; model daysabs = male math langarts /dist=poisson; repeated subject=id /type=cs; run;
The GENMOD Procedure
Analysis Of Initial Parameter Estimates
Standard Wald 95% Chi-
Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq
Intercept 1 2.6877 0.0727 2.5453 2.8301 1368.56 <.0001
male 1 -0.4009 0.0484 -0.4958 -0.3060 68.58 <.0001
math 1 -0.0035 0.0018 -0.0071 0.0000 3.74 0.0531
langarts 1 -0.0122 0.0018 -0.0157 -0.0086 43.86 <.0001
Scale 0 1.0000 0.0000 1.0000 1.0000
NOTE: The scale parameter was held fixed.
GEE Model Information
Correlation Structure Exchangeable
Subject Effect id (316 levels)
Number of Clusters 316
Correlation Matrix Dimension 1
Maximum Cluster Size 1
Minimum Cluster Size 1
Algorithm converged.
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z|
Intercept 2.6877 0.2178 2.2608 3.1145 12.34 <.0001
male -0.4009 0.1394 -0.6741 -0.1278 -2.88 0.0040
math -0.0035 0.0076 -0.0185 0.0114 -0.46 0.6440
langarts -0.0122 0.0053 -0.0225 -0.0018 -2.30 0.0215
The robust standard errors attempt to adjust for heterogeneity in the model. Using the robust standard errors has resulted in a fairly large change in the standard error, which should be more appropriate. The z-tests still yield similar significant results, but give more realistic p-values.
In the main body of the output are the poisson coefficients, robust standard errors, z-scores, p-values and 95% confidence intervals for the coefficients. The variable math was border-line significant without the repeated statement and is clearly not significant with it.
Since math is not significant in the model with robust standard errors, we will rerun the model dropping that variable.
proc genmod data = poissonreg; class id; model daysabs = male langarts /dist=poisson; repeated subject=id /type=cs; run;
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 313 2238.3176 7.1512
Scaled Deviance 313 2238.3176 7.1512
Pearson Chi-Square 313 2752.9132 8.7952
Scaled Pearson X2 313 2752.9132 8.7952
Log Likelihood 1480.3813
Algorithm converged.
Analysis Of Initial Parameter Estimates
Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq
Intercept 1 2.6470 0.0698 2.5102 2.7837 1439.07 <.0001
male 1 -0.4094 0.0482 -0.5039 -0.3148 72.07 <.0001
langarts 1 -0.0147 0.0013 -0.0172 -0.0121 128.64 <.0001
Scale 0 1.0000 0.0000 1.0000 1.0000
NOTE: The scale parameter was held fixed.
GEE Model Information
Correlation Structure Exchangeable
Subject Effect id (316 levels)
Number of Clusters 316
Correlation Matrix Dimension 1
Maximum Cluster Size 1
Minimum Cluster Size 1
Algorithm converged.
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z|
Intercept 2.6470 0.1823 2.2896 3.0044 14.52 <.0001
male -0.4094 0.1352 -0.6744 -0.1443 -3.03 0.0025
langarts -0.0147 0.0034 -0.0214 -0.0079 -4.27 <.0001
This model fits the data significantly better than the null model, i.e., the intercept-only model. To show that this is the case, we can run the null model and compare the null model with the current model using chi-squared test on the difference of log likelihood.
proc genmod data = poissonreg;
class id;
model daysabs = / type3 dist=poisson;
repeated subject=id /type=cs;
run;
quit;
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 315 2409.8204 7.6502
Scaled Deviance 315 2409.8204 7.6502
Pearson Chi-Square 315 3008.3006 9.5502
Scaled Pearson X2 315 3008.3006 9.5502
Log Likelihood 1394.6299
The log likelihood for the full model is 1480.3813 and is 1394.6299 for the null model. The chi-squared value is 2*( 1480.3813 - 1394.6299) = 171.5028. Since we have two predictor variables in the full model, the degrees of freedom for the chi-squared test is 2. This yields a p-value <.0001.
Finally, we will use the estimate statement to get the predicted change in days absent for male and female group when the langarts is held at its mean.proc genmod data = poissonreg; class id ; model daysabs = male langarts /dist=poisson; repeated subject=id /type=cs; estimate "male" langarts 50.0637938 male 1 intercept 1 /exp; estimate "female" langarts 50.0637938 male 0 intercept 1 /exp; run;
Contrast Estimate Results
Standard Chi-
Label Estimate Error Alpha Confidence Limits Square Pr > ChiSq
male 1.5032 0.0966 0.05 1.3138 1.6926 241.98 <.0001
Exp(male) 4.4960 0.4345 0.05 3.7202 5.4335
female 1.9125 0.0992 0.05 1.7182 2.1069 372.06 <.0001
Exp(female) 6.7703 0.6713 0.05 5.5745 8.2225
The Poisson regression model predicting days absent from school stay from language arts and gender was statistically significant with likelihood ratio chi-square = 171.503, df=2 yielding p-value <.0001. The predictors langarts and male were each statically significant. For these data, the expected change in log count for a one-unit increase in language arts was -0.0146. Male students had an expected log count 0.41 less than female students.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services