UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ
How can I minimize loss of data due to missing observations in a repeated measures ANOVA?

Loss of subjects in a repeated measures ANOVA due to missing data can be a serious problem. If you use proc glm to perform you analysis, it will omit observations listwise, meaning that if any of the observations for a subject are missing, the entire subject will be omitted from the analysis. Consider the data file below based on an example of from Design and Analysis by G. Keppel. Pages 414-416. This example contains 8 subjects (sub) with one between subjects IV with 2 levels (group) and 1 within subjects IV with 4 levels. We have inserted 4 missing values to illustrate the impact of missing data in this kind of design.

DATA wide;
  INPUT sub group dv1 dv2 dv3 dv4;
CARDS;
1 1  3  4  7  3
2 1  6  . 12  9
3 1  7 13 11 11
4 1  0  3  .  6
5 2  5  6 11  7
6 2 10 12 18  . 
7 2 10 15 15 14
8 2  5  . 11  9
;
RUN;
 
PROC PRINT DATA=wide ;
RUN; 
OBS    SUB    GROUP    DV1    DV2    DV3    DV4
 1      1       1        3      4      7      3
 2      2       1        6      .     12      9
 3      3       1        7     13     11     11
 4      4       1        0      3      .      6
 5      5       2        5      6     11      7
 6      6       2       10     12     18      .
 7      7       2       10     15     15     14
 8      8       2        5      .     11      9

We start by showing how to perform a standard 2 by 4 (between / within) ANOVA using proc glm.

 PROC GLM DATA=wide;
  CLASS group;
  MODEL dv1-dv4 = group / NOUNI ;
  REPEATED trial 4;
RUN; 

Note the number of observations available for analysis is only four, and that four have been omitted due to missing data. The results of this analysis are shown below.

General Linear Models Procedure
Class Level Information

Class    Levels    Values

GROUP         2    1 2

Number of observations in data set = 8

NOTE: Observations with missing values will not be included in this analysis.
      Thus, only 4 observations can be used in this analysis.

General Linear Models Procedure
Repeated Measures Analysis of Variance
Repeated Measures Level Information

Dependent Variable        DV1      DV2      DV3      DV4
    Level of TRIAL          1        2        3        4

General Linear Models Procedure
Repeated Measures Analysis of Variance
Tests of Hypotheses for Between Subjects Effects

Source                  DF      Type III SS     Mean Square   F Value     Pr > F
GROUP                    1      36.00000000     36.00000000      0.46     0.5673
Error                    2     156.25000000     78.12500000

General Linear Models Procedure
Repeated Measures Analysis of Variance
Univariate Tests of Hypotheses for Within Subject Effects

Source: TRIAL
                                                                   Adj  Pr > F
     DF       Type III SS       Mean Square   F Value   Pr > F    G - G    H - F
      3       47.25000000       15.75000000      5.32   0.0397   0.1430   0.0629

Source: TRIAL*GROUP
                                                                   Adj  Pr > F
     DF       Type III SS       Mean Square   F Value   Pr > F    G - G    H - F
      3        2.50000000        0.83333333      0.28   0.8371   0.6556   0.7898

Source: Error(TRIAL)
     DF       Type III SS       Mean Square
      6       17.75000000        2.95833333

Greenhouse-Geisser Epsilon = 0.3474
       Huynh-Feldt Epsilon = 0.7547

Now, we will illustrate how you can perform this same analysis in proc mixed. First, we need to reshape the data so it is in the shape expected by proc mixed. proc glm expects the data to be in a wide format, where each observation corresponds to a subject. By contrast, proc mixed expects the data to be in a long format where each observation corresponds to a trial. In this case, proc mixed expects that there would be four observations per subject and that each observation would correspond to the measurements on the four different trials. Below we show how you can reshape the data for analysis in proc mixed.

DATA long ;
  SET Wide;
  dv = dv1; trial = 1; OUTPUT;
  dv = dv2; trial = 2; OUTPUT;
  dv = dv3; trial = 3; OUTPUT;
  dv = dv4; trial = 4; OUTPUT;
  DROP dv1 - dv4 ;
RUN;
 
PROC PRINT DATA=long ;
RUN; 

You can compare the proc print for wide with the proc print for long to verify that the data were properly reshaped.

OBS    SUB    GROUP    DV    TRIAL

  1     1       1       3      1
  2     1       1       4      2
  3     1       1       7      3
  4     1       1       3      4
  5     2       1       6      1
  6     2       1       .      2
  7     2       1      12      3
  8     2       1       9      4
  9     3       1       7      1
 10     3       1      13      2
 11     3       1      11      3
 12     3       1      11      4
 13     4       1       0      1
 14     4       1       3      2
 15     4       1       .      3
 16     4       1       6      4
 17     5       2       5      1
 18     5       2       6      2
 19     5       2      11      3
 20     5       2       7      4
 21     6       2      10      1
 22     6       2      12      2
 23     6       2      18      3
 24     6       2       .      4
 25     7       2      10      1
 26     7       2      15      2
 27     7       2      15      3
 28     7       2      14      4
 29     8       2       5      1
 30     8       2       .      2
 31     8       2      11      3
 32     8       2       9      4

Now that the data are in the proper shape, we can analyze it with proc mixed. Proc mixed does not delete missing data listwise. It analyzes all of the data that are present. For the analysis to be valid, it is assumed that the data are missing at random. Rarely, however, are data truly missing at random. To the extent that there are systematic factors that led to the data being missing, the analysis will not be valid. In using this kind of analysis, we recommend that you assess and present information regarding the reasons for missing data and an assessment of the extent to which it was non-random.

PROC MIXED DATA=long;
  CLASS sub group trial;
  MODEL dv = group trial group*trial;
  REPEATED trial / SUBJECT=sub TYPE=CS;
run; 

As you see below, proc mixed analyzed all eight of the subjects and had far less missing data than the analysis with proc glm.

                              The MIXED Procedure

                            Class Level Information

                       Class     Levels  Values
                       SUB            8  1 2 3 4 5 6 7 8
                       GROUP          2  1 2
                       TRIAL          4  1 2 3 4

                       REML Estimation Iteration History

               Iteration  Evaluations     Objective     Criterion
                       0            1   81.93159646
                       1            3   63.43970119    0.00138808
                       2            1   63.39025490    0.00006552
                       3            1   63.38810898    0.00000018
                       4            1   63.38810333    0.00000000

Convergence criteria met.

                     Covariance Parameter Estimates (REML)

                     Cov Parm   Subject      Estimate
                     CS         SUB       10.83244625
                     Residual              2.29522110

                        Model Fitting Information for DV

                    Description                        Value
                    Observations                     28.0000
                    Res Log Likelihood              -50.0728
                    Akaike's Information Criterion  -52.0728
                    Schwarz's Bayesian Criterion    -53.0686
                    -2 Res Log Likelihood           100.1456
                    Null Model LRT Chi-Square        18.5435
                    Null Model LRT DF                 1.0000
                    Null Model LRT P-Value            0.0000

                            Tests of Fixed Effects

                  Source        NDF   DDF  Type III F  Pr > F
                  GROUP           1     6        2.37  0.1748
                  TRIAL           3    14       17.04  0.0001
                  GROUP*TRIAL     3    14        0.40  0.7556

Proc mixed is much more powerful than proc glm. Because it is more powerful, it is more complex to use. This FAQ just scratches the surface in the use of proc mixed.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California