UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SPlus/R Paper Examples
Using SAS Proc Mixed to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models
by Judith Singer

This paper can be downloaded from Professor Singer's web site at http://gseweb.harvard.edu/%7Efaculty/singer/Papers/sasprocmixed.pdf

Sections noted by R indicate differences between R and SPLUS.

R program

To input the hsb12.csv and the willett.csv data file in R use the following code which must be executed before running the R program:

>hsb12 <- read.csv("d:/hsb12.csv", header=T)
>willett <- read.csv("d:/willett.csv", header=T)

The SPLUS data files used in this paper hsb12.sdd and willett.sdd


First set of examples using the hsb12 data file.

Page 329
In order to use the lme function the data object must already have the multilevel structure which is imposed by the groupedData function.

hsb12.int <- groupedData( mathach ~ 1 | school, data=hsb12) 
basic <- lme(mathach ~ 1, data=hsb12.int, random = ~ 1 | school) 
summary(basic)

Linear mixed-effects model fit by REML
 Data: hsb12.int 
       AIC      BIC   logLik 
  47122.79 47143.43 -23558.4

Random effects:
 Formula:  ~ 1 | school
        (Intercept) Residual 
StdDev:    2.934966 6.256862

Fixed effects: mathach ~ 1 
               Value Std.Error   DF  t-value p-value 
(Intercept) 12.63697 0.2443936 7025 51.70747  <.0001

Standardized Within-Group Residuals:
       Min        Q1        Med        Q3      Max 
 -3.063125 -0.753874 0.02670132 0.7606217 2.742626

Number of Observations: 7185
Number of Groups: 160 

Page 331  Table 2, random intercept, predictor = meanses


hsb12.meanses <- groupedData(mathach ~ 1 | school, data=hsb12, outer= ~ meanses)
fit1 <- lme(mathach ~ meanses, data=hsb12.meanses, random = ~ 1 | school)
summary(fit1)

Linear mixed-effects model fit by REML
 Data: hsb12.meanses 
       AIC     BIC    logLik 
  46969.28 46996.8 -23480.64

Random effects:
 Formula:  ~ 1 | school
        (Intercept) Residual 
StdDev:     1.62441 6.257562

Fixed effects: mathach ~ meanses 
               Value Std.Error   DF  t-value p-value 
(Intercept) 12.64944 0.1492801 7025 84.73622  <.0001
    meanses  5.86354 0.3614580  158 16.22191  <.0001

Standardized Within-Group Residuals:
       Min         Q1        Med        Q3      Max 
 -3.134798 -0.7525634 0.02408657 0.7677326 2.785014

Number of Observations: 7185
Number of Groups: 160 

Page 335

# compute cses
hsb12$cses <- hsb12$ses - hsb12$meanses
hsb12.cses <- groupedData( mathach ~ cses | school, data=hsb12)  
# use unstructured correlation (default in lme function, corr = NULL)
fit2 <- lme(mathach ~ cses, data=hsb12.cses, random = ~ cses | school )
summary(fit2)

Linear mixed-effects model fit by REML
 Data: hsb12.cses 
       AIC      BIC    logLik 
  46726.24 46767.52 -23357.12

Random effects:
 Formula:  ~ cses | school
 Structure: General positive-definite
               StdDev   Corr 
(Intercept) 2.9462163 (Inter
       cses 0.8328983 0.011 
   Residual 6.0580812       

Fixed effects: mathach ~ cses 
               Value Std.Error   DF  t-value p-value 
(Intercept) 12.64925 0.2444948 7024 51.73627  <.0001
       cses  2.19291 0.1282521 7024 17.09842  <.0001

Standardized Within-Group Residuals:
       Min         Q1        Med        Q3     Max 
 -3.096421 -0.7313269 0.01767674 0.7537266 2.90052

Number of Observations: 7185
Number of Groups: 160 

Pages 337-338,

hsb12.big <- groupedData( mathach ~ meanses + sector + cses | school, data=hsb12)  
fit3 <- lme(mathach ~ meanses + sector + cses + meanses:cses + sector:cses, 
             data=hsb12.big, random = ~ cses | school)
summary(fit3)

fit4 <- lme(mathach ~ meanses + sector + cses + meanses:cses + sector:cses, 
             data=hsb12.big, random = ~ 1 | school)
summary(fit4)
anova(fit3, fit4)

Linear mixed-effects model fit by REML
 Data: hsb12.big 
       AIC      BIC    logLik 
  46523.67 46592.46 -23251.83

Random effects:
 Formula:  ~ cses | school
 Structure: General positive-definite
               StdDev   Corr 
(Intercept) 1.5432934 (Inter
       cses 0.3195791 0.39  
   Residual 6.0597709       

Fixed effects: mathach ~ meanses + sector + cses + meanses:cses + sector:cses 
                 Value Std.Error   DF   t-value p-value 
 (Intercept)  12.11359 0.1988039 7022  60.93233  <.0001
     meanses   5.33913 0.3692924  157  14.45772  <.0001
      sector   1.21667 0.3063801  157   3.97110  0.0001
        cses   2.93874 0.1551255 7022  18.94427  <.0001
meanses:cses   1.03879 0.2989595 7022   3.47467  0.0005
 sector:cses  -1.64255 0.2398410 7022  -6.84848  <.0001

Standardized Within-Group Residuals:
       Min         Q1        Med        Q3      Max 
 -3.159236 -0.7231732 0.01705645 0.7544342 2.958207

Number of Observations: 7185
Number of Groups: 160 

     Model df      AIC      BIC    logLik   Test  L.Ratio p-value 
fit3     1 10 46523.67 46592.46 -23251.83                        
fit4     2  8 46520.79 46575.83 -23252.40 1 vs 2 1.126403  0.5694

Second set of examples using willet data file.

Page 342
From looking at the graph we see that each group has not only a different intercept but also a different slope suggesting that we should consider including both the intercept and the slope as random components in our model.
Note: If the graph does not by default have colors that you like you change them using the code on our SPLUS FAQ: How do I change colors in a trellis graph?.

#grouping the data so that we can graph it 
willettg <- groupedData( y ~ time | id, data=willett)
#Looking at the data:
plot(willettg)

Page 342
Including both the intercept and the slope of time as random components.
Note: The intercept is included in the random argument even though it is not explicitly listed in the random argument. In order to have only the slope of time included as a random component the random argument would look like this: random=~ time -1 and it would be the -1 which would indicate to SPLUS that the intercept is not random.

wfit1 <- lme(y ~ time, data=willettg, random = ~ time | id)
summary(wfit1)

Linear mixed-effects model fit by REML
 Data: willettg 
       AIC      BIC    logLik 
  1278.823 1296.386 -633.4114

Random effects:
 Formula:  ~ time | id
 Structure: General positive-definite
              StdDev   Corr 
(Intercept) 34.62336 (Inter
       time 11.50654 -0.45 
   Residual 12.62843       

Fixed effects: y ~ time 
               Value Std.Error  DF  t-value p-value 
(Intercept) 164.3743  6.118849 104 26.86360  <.0001
       time  26.9600  2.166604 104 12.44344  <.0001

Standardized Within-Group Residuals:
       Min         Q1        Med        Q3      Max 
 -2.272063 -0.6003579 0.02631387 0.6073068 1.584396

Number of Observations: 140
Number of Groups: 35

Looking at the box plots of the residual by id. The residuals are centered at zero but the size of the variance seems to vary greatly from group to group. However, since there are only four observations for each level of id we should not rely too much on the box plots for inference about the within group variance.

plot( wfit1, id ~ resid(.), abline = 0) 

To check for any outliers we look at the standardized residual plot versus fitted values. The type="p" argument specifies that the standardized residual should be used. The id argument indicates that a critical value for identifying observations in the plot which in this case are the observations where the standardized residual value is greater then the 1 - id/2 standard normal quantile in absolute value. The default label for the observations is the group label. The adj argument controls the position of the label for the observations identified.

plot(wfit1, resid(., type="p") ~ fitted(.), id=0.05, adj=-1, col=8)

The fitted line for each id.

plot( augPred(wfit1), layout=c(4,3), col=8  )
 

Page 344

#centering the covar variable
willett$ccovar <- willett$covar - mean(willett$covar)
#grouping the data  
willettg1 <- groupedData( y ~ time | id, data=willett, outer=~ccovar)
wfit2 <- lme(y ~ time*ccovar, data=willettg1, random = ~ time | id)
summary(wfit2)

Linear mixed-effects model fit by REML
 Data: willettg1 
       AIC      BIC    logLik 
  1276.285 1299.586 -630.1424

Random effects:
 Formula:  ~ time | id
 Structure: General positive-definite
              StdDev   Corr 
(Intercept) 35.16266 (Inter
       time 10.35612 -0.489
   Residual 12.62843       

Fixed effects: y ~ time * ccovar 
                Value Std.Error  DF   t-value p-value 
(Intercept)  164.3743  6.206095 103  26.48594  <.0001
       time   26.9600  1.993881 103  13.52137  <.0001
     ccovar   -0.1136  0.504012  33  -0.22530  0.8231
time:ccovar    0.4329  0.161928 103   2.67315  0.0087

Standardized Within-Group Residuals:
       Min         Q1         Med        Q3      Max 
 -2.248171 -0.6187254 0.004282326 0.6147228 1.556882

Number of Observations: 140
Number of Groups: 35 

Bottom of page 346 and top of page 347

#compound symmetry
fit.gls.cs <- gls(y~time, data=willett, corr=corCompSymm(, form=~time | id))
summary(fit.gls.cs)
#autoregressive
fit.gls.ar1 <- gls(y~time, data=willett, corr=corAR1(, form=~time | id))
anova(fit.gls.cs, fit.gls.ar1)

Generalized least squares fit by REML
  Model: y ~ time 
  Data: willett 
      AIC      BIC    logLik 
  1308.34 1320.049 -650.1698

Correlation Structure: Compound symmetry
 Formula:  ~ time | id 
 Parameter estimate(s):
       Rho 
 0.7064953

Coefficients:
               Value Std.Error  t-value p-value 
(Intercept) 164.3743  5.776690 28.45475  <.0001
       time  26.9600  1.465603 18.39516  <.0001

 Correlation: 
     (Intr) 
time -0.381

Standardized residuals:
       Min         Q1        Med        Q3      Max 
 -2.469437 -0.7149002 0.03313234 0.8082695 1.809186

Residual standard error: 35.78721 
Degrees of freedom: 140 total; 138 residual

            Model df      AIC      BIC    logLik 
 fit.gls.cs     1  4 1308.340 1320.049 -650.1698
fit.gls.ar1     2  4 1281.465 1293.174 -636.7327

p. 348

wfit2.ar1 <- lme( y ~ time*ccovar, data=willettg1, random = ~ time | id, 
              corr=corAR1(, form =~ time|id))
summary(wfit2.ar1)

Linear mixed-effects model fit by REML
 Data: willettg1 
       AIC      BIC    logLik 
  1278.045 1304.259 -630.0223

Random effects:
 Formula:  ~ time | id
 Structure: General positive-definite
              StdDev   Corr 
(Intercept) 35.46865 (Inter
       time 10.53770 -0.488
   Residual 11.88619       

Correlation Structure: AR(1)
 Formula:  ~ time | id 
 Parameter estimate(s):
        Phi 
 -0.1374558
Fixed effects: y ~ time * ccovar 
                Value Std.Error  DF   t-value p-value 
(Intercept)  164.4229  6.198601 103  26.52581  <.0001
       time   26.9080  1.978001 103  13.60363  <.0001
     ccovar   -0.1234  0.503403  33  -0.24517  0.8078
time:ccovar    0.4357  0.160638 103   2.71256  0.0078

Standardized Within-Group Residuals:
       Min         Q1         Med        Q3      Max 
 -2.318034 -0.6248881 -0.01432131 0.6548167 1.567537

Number of Observations: 140
Number of Groups: 35 

plot( augPred(wfit2.ar1), layout=c(4,3), col=8 )

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.