|
|
|
||||
|
|
|||||
Sections noted by R
indicate differences between R and SPLUS.
R program
To input the hsb12.csv and the willett.csv
data file in R use the following code which must be executed before running the R program:
>hsb12 <- read.csv("d:/hsb12.csv", header=T)
>willett <- read.csv("d:/willett.csv", header=T)
The SPLUS data files used in this paper hsb12.sdd and willett.sdd
First set of examples using the hsb12 data file.
Page 329
In order to use the lme function the data object must already have the
multilevel structure which is imposed by the groupedData function.
hsb12.int <- groupedData( mathach ~ 1 | school, data=hsb12)
basic <- lme(mathach ~ 1, data=hsb12.int, random = ~ 1 | school)
summary(basic)
Linear mixed-effects model fit by REML
Data: hsb12.int
AIC BIC logLik
47122.79 47143.43 -23558.4
Random effects:
Formula: ~ 1 | school
(Intercept) Residual
StdDev: 2.934966 6.256862
Fixed effects: mathach ~ 1
Value Std.Error DF t-value p-value
(Intercept) 12.63697 0.2443936 7025 51.70747 <.0001
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.063125 -0.753874 0.02670132 0.7606217 2.742626
Number of Observations: 7185
Number of Groups: 160
Page 331 Table 2, random intercept, predictor = meanses
hsb12.meanses <- groupedData(mathach ~ 1 | school, data=hsb12, outer= ~ meanses)
fit1 <- lme(mathach ~ meanses, data=hsb12.meanses, random = ~ 1 | school)
summary(fit1)
Linear mixed-effects model fit by REML
Data: hsb12.meanses
AIC BIC logLik
46969.28 46996.8 -23480.64
Random effects:
Formula: ~ 1 | school
(Intercept) Residual
StdDev: 1.62441 6.257562
Fixed effects: mathach ~ meanses
Value Std.Error DF t-value p-value
(Intercept) 12.64944 0.1492801 7025 84.73622 <.0001
meanses 5.86354 0.3614580 158 16.22191 <.0001
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.134798 -0.7525634 0.02408657 0.7677326 2.785014
Number of Observations: 7185
Number of Groups: 160
Page 335
# compute cses
hsb12$cses <- hsb12$ses - hsb12$meanses
hsb12.cses <- groupedData( mathach ~ cses | school, data=hsb12)
# use unstructured correlation (default in lme function, corr = NULL)
fit2 <- lme(mathach ~ cses, data=hsb12.cses, random = ~ cses | school )
summary(fit2)
Linear mixed-effects model fit by REML
Data: hsb12.cses
AIC BIC logLik
46726.24 46767.52 -23357.12
Random effects:
Formula: ~ cses | school
Structure: General positive-definite
StdDev Corr
(Intercept) 2.9462163 (Inter
cses 0.8328983 0.011
Residual 6.0580812
Fixed effects: mathach ~ cses
Value Std.Error DF t-value p-value
(Intercept) 12.64925 0.2444948 7024 51.73627 <.0001
cses 2.19291 0.1282521 7024 17.09842 <.0001
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.096421 -0.7313269 0.01767674 0.7537266 2.90052
Number of Observations: 7185
Number of Groups: 160
Pages 337-338,
hsb12.big <- groupedData( mathach ~ meanses + sector + cses | school, data=hsb12)
fit3 <- lme(mathach ~ meanses + sector + cses + meanses:cses + sector:cses,
data=hsb12.big, random = ~ cses | school)
summary(fit3)
fit4 <- lme(mathach ~ meanses + sector + cses + meanses:cses + sector:cses,
data=hsb12.big, random = ~ 1 | school)
summary(fit4)
anova(fit3, fit4)
Linear mixed-effects model fit by REML
Data: hsb12.big
AIC BIC logLik
46523.67 46592.46 -23251.83
Random effects:
Formula: ~ cses | school
Structure: General positive-definite
StdDev Corr
(Intercept) 1.5432934 (Inter
cses 0.3195791 0.39
Residual 6.0597709
Fixed effects: mathach ~ meanses + sector + cses + meanses:cses + sector:cses
Value Std.Error DF t-value p-value
(Intercept) 12.11359 0.1988039 7022 60.93233 <.0001
meanses 5.33913 0.3692924 157 14.45772 <.0001
sector 1.21667 0.3063801 157 3.97110 0.0001
cses 2.93874 0.1551255 7022 18.94427 <.0001
meanses:cses 1.03879 0.2989595 7022 3.47467 0.0005
sector:cses -1.64255 0.2398410 7022 -6.84848 <.0001
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.159236 -0.7231732 0.01705645 0.7544342 2.958207
Number of Observations: 7185
Number of Groups: 160
Model df AIC BIC logLik Test L.Ratio p-value
fit3 1 10 46523.67 46592.46 -23251.83
fit4 2 8 46520.79 46575.83 -23252.40 1 vs 2 1.126403 0.5694
Second set of examples using willet data file.
Page 342
From looking at the graph we see that each group has not only a different
intercept but also a different slope suggesting that we should consider
including both the intercept and the slope as random components in our model.
Note: If the graph does not by default have colors that you like you change them
using the code on our SPLUS FAQ: How do I
change colors in a trellis graph?.
#grouping the data so that we can graph it willettg <- groupedData( y ~ time | id, data=willett) #Looking at the data: plot(willettg)
Page 342
Including both the intercept and the slope of time as random components.
Note: The intercept is included in the random argument even though it is not
explicitly listed in the random argument. In order to have only the
slope of time included as a random component the random argument would look
like this: random=~ time -1 and it would be the -1 which would indicate to SPLUS
that the intercept is not random.
wfit1 <- lme(y ~ time, data=willettg, random = ~ time | id)
summary(wfit1)
Linear mixed-effects model fit by REML
Data: willettg
AIC BIC logLik
1278.823 1296.386 -633.4114
Random effects:
Formula: ~ time | id
Structure: General positive-definite
StdDev Corr
(Intercept) 34.62336 (Inter
time 11.50654 -0.45
Residual 12.62843
Fixed effects: y ~ time
Value Std.Error DF t-value p-value
(Intercept) 164.3743 6.118849 104 26.86360 <.0001
time 26.9600 2.166604 104 12.44344 <.0001
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.272063 -0.6003579 0.02631387 0.6073068 1.584396
Number of Observations: 140
Number of Groups: 35
Looking at the box plots of the residual by id. The residuals are centered at zero but the size of the variance seems to vary greatly from group to group. However, since there are only four observations for each level of id we should not rely too much on the box plots for inference about the within group variance.
plot( wfit1, id ~ resid(.), abline = 0)
To check for any outliers we look at the standardized residual plot versus fitted values. The type="p" argument specifies that the standardized residual should be used. The id argument indicates that a critical value for identifying observations in the plot which in this case are the observations where the standardized residual value is greater then the 1 - id/2 standard normal quantile in absolute value. The default label for the observations is the group label. The adj argument controls the position of the label for the observations identified.
plot(wfit1, resid(., type="p") ~ fitted(.), id=0.05, adj=-1, col=8)
The fitted line for each id.
plot( augPred(wfit1), layout=c(4,3), col=8 )
Page 344
#centering the covar variable
willett$ccovar <- willett$covar - mean(willett$covar)
#grouping the data
willettg1 <- groupedData( y ~ time | id, data=willett, outer=~ccovar)
wfit2 <- lme(y ~ time*ccovar, data=willettg1, random = ~ time | id)
summary(wfit2)
Linear mixed-effects model fit by REML
Data: willettg1
AIC BIC logLik
1276.285 1299.586 -630.1424
Random effects:
Formula: ~ time | id
Structure: General positive-definite
StdDev Corr
(Intercept) 35.16266 (Inter
time 10.35612 -0.489
Residual 12.62843
Fixed effects: y ~ time * ccovar
Value Std.Error DF t-value p-value
(Intercept) 164.3743 6.206095 103 26.48594 <.0001
time 26.9600 1.993881 103 13.52137 <.0001
ccovar -0.1136 0.504012 33 -0.22530 0.8231
time:ccovar 0.4329 0.161928 103 2.67315 0.0087
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.248171 -0.6187254 0.004282326 0.6147228 1.556882
Number of Observations: 140
Number of Groups: 35
Bottom of page 346 and top of page 347
#compound symmetry
fit.gls.cs <- gls(y~time, data=willett, corr=corCompSymm(, form=~time | id))
summary(fit.gls.cs)
#autoregressive
fit.gls.ar1 <- gls(y~time, data=willett, corr=corAR1(, form=~time | id))
anova(fit.gls.cs, fit.gls.ar1)
Generalized least squares fit by REML
Model: y ~ time
Data: willett
AIC BIC logLik
1308.34 1320.049 -650.1698
Correlation Structure: Compound symmetry
Formula: ~ time | id
Parameter estimate(s):
Rho
0.7064953
Coefficients:
Value Std.Error t-value p-value
(Intercept) 164.3743 5.776690 28.45475 <.0001
time 26.9600 1.465603 18.39516 <.0001
Correlation:
(Intr)
time -0.381
Standardized residuals:
Min Q1 Med Q3 Max
-2.469437 -0.7149002 0.03313234 0.8082695 1.809186
Residual standard error: 35.78721
Degrees of freedom: 140 total; 138 residual
Model df AIC BIC logLik
fit.gls.cs 1 4 1308.340 1320.049 -650.1698
fit.gls.ar1 2 4 1281.465 1293.174 -636.7327
p. 348
wfit2.ar1 <- lme( y ~ time*ccovar, data=willettg1, random = ~ time | id,
corr=corAR1(, form =~ time|id))
summary(wfit2.ar1)
Linear mixed-effects model fit by REML
Data: willettg1
AIC BIC logLik
1278.045 1304.259 -630.0223
Random effects:
Formula: ~ time | id
Structure: General positive-definite
StdDev Corr
(Intercept) 35.46865 (Inter
time 10.53770 -0.488
Residual 11.88619
Correlation Structure: AR(1)
Formula: ~ time | id
Parameter estimate(s):
Phi
-0.1374558
Fixed effects: y ~ time * ccovar
Value Std.Error DF t-value p-value
(Intercept) 164.4229 6.198601 103 26.52581 <.0001
time 26.9080 1.978001 103 13.60363 <.0001
ccovar -0.1234 0.503403 33 -0.24517 0.8078
time:ccovar 0.4357 0.160638 103 2.71256 0.0078
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.318034 -0.6248881 -0.01432131 0.6548167 1.567537
Number of Observations: 140
Number of Groups: 35
plot( augPred(wfit2.ar1), layout=c(4,3), col=8 )
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services