|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
Example 1. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include gender of the student and standardized test scores in math and language arts.
Example 2. The state wildlife biologists want to model how many fish are being caught by fishermen at a state park. Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught. Some visitors do not fish, but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zero's in the data because of the people that did not fish.
We have attendance data on 316 high school juniors from two urban high schools in the file poissonreg.dat . The response variable of interest is days absent, daysabs. The variables math and langarts give the standardized test scores for math and language arts respectively. The variable male is a binary indicator of student gender.
In addition to predicting the number of days absent there is interest in predicting the existence of excess zeros, i.e., the probability that a student will have zero absences. We will use both male and school to investigate this.
Let's look at the data.
Data:
File is d:\poissonreg.dat ;
Variable:
Names are
id school male math langarts daysatt daysabs;
Missing are all (-9999) ;
usevariables are male math langarts daysabs;
analysis:
type = basic;
plot: type is plot1;
SAMPLE STATISTICS
Means
MALE MATH LANGARTS DAYSABS
________ ________ ________ ________
1 0.487 48.751 50.064 5.810
Covariances
MALE MATH LANGARTS DAYSABS
________ ________ ________ ________
MALE 0.251
MATH -0.481 319.721
LANGARTS -1.507 220.942 321.815
DAYSABS -0.456 -21.672 -24.319 55.488
Correlations
MALE MATH LANGARTS DAYSABS
________ ________ ________ ________
MALE 1.000
MATH -0.054 1.000
LANGARTS -0.168 0.689 1.000
DAYSABS -0.122 -0.163 -0.182 1.000
![]()
Some Strategies You Might Be Tempted To Try
Before we show how you can analyze this with a zero-inflated Poisson analysis, let's consider some other methods that you might use.
NOTE: This example was done using Mplus version 4.21. The syntax may not work with earlier versions of Mplus.
In the syntax below, we have indicated that daysabs is a count variable by using the count statement. The (i) option is used to indicate that we are specifying a zero-inflated poisson model. Without the (i) option, we would be estimating a poisson model without zero-inflation. Also, we use the usevar statement to indicate that we are not using all of the variables in the data set in the current model. We have omitted the missing statement because we have no missing data in this data set. The default estimation method is MLR - maximum likelihood parameter estimates with standard errors and a chi-square test statistic that are robust to non-normality and non-independence of observations when used with type = complex, according to the Mplus 4 manual. The MLR standard errors are computed using a sandwich estimator. This is what we generally call robust standard errors. To get the "regular" standard errors, we use the estimator = ml on the analysis statement. (In the next example, we will omit the analysis statement and obtain the robust standard errors.)
TITLE: Mplus DAE for zero-inflated poisson regression DATA: FILE IS D:\poissonreg.dat; VARIABLE: NAMES ARE id school male math langarts daysatt daysabs; COUNT IS daysabs (i); usevar school male math langarts daysabs; ANALYSIS: estimator = ml; MODEL: daysabs ON math langarts male; daysabs#1 ON school male;MODEL RESULTS Estimates S.E. Est./S.E. DAYSABS ON MATH 0.000 0.002 -0.164 LANGARTS -0.009 0.002 -4.974 MALE -0.246 0.049 -5.055 DAYSABS#1 ON SCHOOL 1.151 0.314 3.662 MALE 0.869 0.305 2.854 Intercepts DAYSABS#1 -3.704 0.594 -6.233 DAYSABS 2.545 0.073 34.744
In the MODEL RESULTS section of the output you will find the poisson regression coefficients (estimates) for each of the variables, standard errors and the ratio of the estimate to its standard error. This can be used as a Z test, where values greater than 2 are considered to be statistically significant. Following these are probit coefficients for predicting excess zeros. In the above output, we see that math is not statistically significant, while langarts and male are. Both school and male are statistically significant predictors of the zero inflation.
Now let's rerun the analysis without the analysis statement in order to obtain robust standard errors.
TITLE: Mplus DAE for zero-inflated poisson regression DATA: FILE IS D:\poissonreg.dat; VARIABLE: NAMES ARE id school male math langarts daysatt daysabs; COUNT IS daysabs (i); usevar school male math langarts daysabs; MODEL: daysabs ON math langarts male; daysabs#1 ON school male;
MODEL RESULTS
Estimates S.E. Est./S.E.
DAYSABS ON
MATH 0.000 0.007 -0.042
LANGARTS -0.009 0.005 -1.845
MALE -0.246 0.129 -1.914
DAYSABS#1 ON
SCHOOL 1.151 0.309 3.723
MALE 0.869 0.301 2.892
Intercepts
DAYSABS#1 -3.704 0.569 -6.516
DAYSABS 2.545 0.207 12.318
Using the robust standard errors, which tend to be larger than "regular" standard errors, we see that math, langarts and male are not statistically significant. The variables school and male are still statistically significant predictors of the zero inflation.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services