UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

Stata Library
Panel Data Analysis using GEE

Introduction

Panel data analysis, also known as cross-sectional time-series analysis, looks at a group of people, the 'panel,' on more than one occasion. Panel studies are essentially equivalent to longitudinal studies, although there may be many response variables observed at each time point.

These data are from a 1996 study (Gregoire, Kumar Everitt, Henderson & Studd) on the efficacy of estrogen patches in treating postnatal depression. Women were randomly assigned to either a placebo control group (group=0, n=27) or estrogen patch group (group=1, n=34). Prior to the first treatment all patients took the Edinburgh Postnatal Depression Scale (EPDS). EPDS data was collected monthly for six months once the treatment began. Higher scores on the EDPS are indicative of higher levels of depression.

Before reading in the data we will need to change the size of the largest matrix that Stata can use. We need to do this because one of the analyses requires a large number of coded variables:

set matsize 160
use http://www.ats.ucla.edu/stat/stata/library/depress, clear 

Let the analyses begin

Note that the data are in the wide format, we will collect some information and perform two analyses while the data are in this format.

sort group

by group: summarize pre dep1 dep2 dep3 dep4 dep5 dep6

-> group=        0  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     pre |      27    20.77778   3.954874         15         28  
    dep1 |      27    16.48148   5.279644          7         26  
    dep2 |      22    15.88818   6.124177          4         27  
    dep3 |      17    14.12882   4.974648       4.19         22  
    dep4 |      17    12.27471   5.848791          2         23  
    dep5 |      17    11.40294   4.438702       3.03         18  
    dep6 |      17    10.89588    4.68157       3.45         20  

-> group=        1  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     pre |      34    21.24882   3.574432         15         28  
    dep1 |      34    13.36794   5.556373          1         27  
    dep2 |      31    11.73677   6.575079          1         27  
    dep3 |      29    9.134138   5.475564          1         24  
    dep4 |      28    8.827857   4.666653          0         22  
    dep5 |      28    7.309286   5.740988          0         24  
    dep6 |      28    6.590714   4.730158          1         23 

corr pre dep1 dep2 dep3 dep4 dep5 dep6

(obs=45)

         |      pre     dep1     dep2     dep3     dep4     dep5     dep6
---------+---------------------------------------------------------------
     pre |   1.0000
    dep1 |   0.1922   1.0000
    dep2 |   0.3904   0.4982   1.0000
    dep3 |   0.3958   0.5258   0.8672   1.0000
    dep4 |   0.1658   0.3933   0.7357   0.7831   1.0000
    dep5 |   0.2848   0.3674   0.7500   0.8520   0.8449   1.0000
    dep6 |   0.2688   0.2795   0.6900   0.7967   0.7894   0.9014   1.0000

graph matrix dep1 dep2 dep3 dep4 dep5 dep6, half



Let's check to see if the groups differ on the pretest depression score:

ttest pre, by(group)

Two-sample t test with equal variances

------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |      27    20.77778    .7611158    3.954874    19.21328    22.34227
       1 |      34    21.24882      .61301    3.574432    20.00165      22.496
---------+--------------------------------------------------------------------
combined |      61    21.04033     .476678    3.722975    20.08683    21.99383
---------+--------------------------------------------------------------------
    diff |           -.4710457    .9658499               -2.403707    1.461615
------------------------------------------------------------------------------
Degrees of freedom: 59

                      Ho: mean(0) - mean(1) = diff = 0

     Ha: diff < 0               Ha: diff ~= 0              Ha: diff > 0
       t =  -0.4877                t =  -0.4877              t =  -0.4877
   P < t =   0.3138          P > |t| =   0.6276          P > t =   0.6862

There isn't much of a difference between groups on the pretest so let's continue on to the panel data analysis.

GEE with Continuous Response Variable

In order to use these data for our panel data analysis, the data must be reorganized into the long form using the reshape command.

reshape long dep, i(subj) j(visit)

(note:  j = 1 2 3 4 5 6)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       61   ->     366
Number of variables                   9   ->       5
j variable (6 values)                     ->   visit
xij variables:
                     dep1 dep2 ... dep6   ->   dep
-----------------------------------------------------------------------------

Before we begin the panel data anlyses let's look at some other analyses for comparison. We will begin with a repeated measures analysis of variance. This is the analysis that requires the larger matrix size.

anova dep group / subj|group visit group*visit /, repeated(visit)

                           Number of obs =     295     R-squared     =  0.7699
                           Root MSE      = 3.39594     Adj R-squared =  0.6980

                  Source |  Partial SS    df       MS           F     Prob > F
             ------------+----------------------------------------------------
                   Model |  8643.81572    70  123.483082      10.71     0.0000
                         |
                   group |  548.494938     1  548.494938       5.60     0.0212
              subj|group |  5775.54143    59  97.8905328   
             ------------+----------------------------------------------------
                   visit |  1050.05444     5  210.010889      18.21     0.0000
             group*visit |  19.3028953     5  3.86057906       0.33     0.8916
                         |
                Residual |  2583.26536   224  11.5324346   
             ------------+----------------------------------------------------
                   Total |  11227.0811   294  38.1873506   


Between-subjects error term:  subj|group
                     Levels:  61        (59 df)
     Lowest b.s.e. variable:  subj
     Covariance pooled over:  group     (for repeated variable)

Repeated variable: visit
                                          Huynh-Feldt epsilon        =  0.5930
                                          Greenhouse-Geisser epsilon =  0.5532
                                          Box's conservative epsilon =  0.2000

                                            ------------ Prob > F ------------
                  Source |     df      F    Regular    H-F      G-G      Box
             ------------+----------------------------------------------------
                   visit |      5    18.21   0.0000   0.0000   0.0000   0.0001
             group*visit |      5     0.33   0.8916   0.7979   0.7840   0.5658
                Residual |    224
             ------------+----------------------------------------------------
     
matrix list e(Srep)

symmetric e(Srep)[6,6]
           c1         c2         c3         c4         c5         c6
r1  31.361171
r2   15.71989  38.927914
r3  13.555927  28.365674   27.90249
r4  9.4625252   22.74371  20.519069  26.403025
r5  8.6149335  23.887935  23.161248   22.47211  28.026157
r6  4.6830378  19.242424  18.721233   18.46616  22.103924  22.204237

This analysis indicates that both group and visit are significant while the group*visit interaction is not. Some researchers are critical of this type of analysis since it is based on fixed-effects adjusted for the repeated factor. Also, this repeated measures analysis assumes compound symmetry in the covariance matrix (which seems to be a stretch in this case). However, we can do worse. The next several analyses are not meant to answer the research question but to show relationships among several different commands in Stata.

regress dep pre group visit

  Source |       SS       df       MS                  Number of obs =     295
---------+------------------------------               F(  3,   291) =   48.05
   Model |  3719.12931     3  1239.70977               Prob > F      =  0.0000
Residual |  7507.95176   291  25.8005215               R-squared     =  0.3313
---------+------------------------------               Adj R-squared =  0.3244
   Total |  11227.0811   294  38.1873506               Root MSE      =  5.0794

------------------------------------------------------------------------------
     dep |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |   .4769071   .0798565      5.972   0.000       .3197376    .6340767
   group |  -4.290664   .6072954     -7.065   0.000      -5.485912   -3.095416
   visit |  -1.307841    .169842     -7.700   0.000      -1.642116   -.9735667
   _cons |   8.233577   1.803945      4.564   0.000       4.683143    11.78401
------------------------------------------------------------------------------

glm dep pre group visit, fam(gaus) link(iden)

Iteration 1 : deviance = 7507.9518

Residual df  =       291                                No. of obs =       295
Pearson X2   =  7507.952                                Deviance   =  7507.952
Dispersion   =  25.80052                                Dispersion =  25.80052

Gaussian (normal) distribution, identity link
------------------------------------------------------------------------------
     dep |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |   .4769071   .0798565      5.972   0.000       .3197376    .6340767
   group |  -4.290664   .6072954     -7.065   0.000      -5.485912   -3.095416
   visit |  -1.307841    .169842     -7.700   0.000      -1.642116   -.9735667
   _cons |   8.233577   1.803945      4.564   0.000       4.683143    11.78401
------------------------------------------------------------------------------
(Model is ordinary regression, use regress instead)

We are finally ready to try the panel data analysis using Stata's xtgee command. xtgee allows us to specify various working covariance structures through the use of the corr option. We will start with an covariance structure of independence. We don't believe that this is the correct covariance structure but it allows us to compare results with the OLS regression and the glm results above. The estat wcorrelations (which we will abbreviate as estat wcorr) will allow us to view the working correlation matrix.

xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(ind)

Iteration 1: tolerance = 3.270e-15

GEE population-averaged model                   Number of obs      =       295
Group variable:                       subj      Number of groups   =        61
Link:                             identity      Obs per group: min =         1
Family:                           Gaussian                     avg =       4.8
Correlation:                   independent                     max =         6
                                                Wald chi2(3)       =    146.13
Scale parameter:                  25.45068      Prob > chi2        =    0.0000

Pearson chi2(295):                 7507.95      Deviance           =   7507.95
Dispersion (Pearson):             25.45068      Dispersion         =  25.45068

------------------------------------------------------------------------------
     dep |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |   .4769071   .0793133      6.013   0.000        .321456    .6323582
   group |  -4.290664   .6031641     -7.114   0.000      -5.472844   -3.108484
   visit |  -1.307841   .1686866     -7.753   0.000      -1.638461   -.9772215
   _cons |   8.233577   1.791673      4.595   0.000       4.721962    11.74519
------------------------------------------------------------------------------

estat wcorr

Estimated within-subj correlation matrix R:

        c1      c2      c3      c4      c5      c6
r1  1.0000
r2  0.0000  1.0000
r3  0.0000  0.0000  1.0000
r4  0.0000  0.0000  0.0000  1.0000
r5  0.0000  0.0000  0.0000  0.0000  1.0000
r6  0.0000  0.0000  0.0000  0.0000  0.0000  1.0000

The three previous analyses yielded identical but propbably incorrect results. The common thread among them is that they all assume that the observations within subjects are independent. This seems, on the face of it, to be highly unlikely. Scores on the depression scale are not likely to be independent from one visit to the next.

We can also try analyzing these data using compound symmetry for the correlational structure. Compound symmetry is obtained using exchangable for the corr option in xtgee.

xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(exc)

GEE population-averaged model                   Number of obs      =       295
Group variable:                       subj      Number of groups   =        61
Link:                             identity      Obs per group: min =         1
Family:                           Gaussian                     avg =       4.8
Correlation:                  exchangeable                     max =         6
                                                Wald chi2(3)       =    135.08
Scale parameter:                  25.56569      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     dep |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |   .4599018   .1441533      3.190   0.001       .1773666     .742437
   group |  -4.024676   1.081131     -3.723   0.000      -6.143654   -1.905698
   visit |  -1.226764   .1175009    -10.440   0.000      -1.457062   -.9964666
   _cons |   8.432806   3.120987      2.702   0.007       2.315783    14.54983
-----------------------------------------------------------------------------

estat wcorr

Estimated within-subj correlation matrix R:

        c1      c2      c3      c4      c5      c6
r1  1.0000
r2  0.5554  1.0000
r3  0.5554  0.5554  1.0000
r4  0.5554  0.5554  0.5554  1.0000
r5  0.5554  0.5554  0.5554  0.5554  1.0000
r6  0.5554  0.5554  0.5554  0.5554  0.5554  1.0000

Note in particular the change in the standard errors between this analysis and the previous one. Next, what if we impose no preconceived notions about the correlations among the responses over time. In this next example, we will request an unstructured correlation matrix. This is equivalent to the assumptions made in a multivariate analysis.

xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(unstr)

GEE population-averaged model                   Number of obs      =       295
Group and time vars:            subj visit      Number of groups   =        61
Link:                             identity      Obs per group: min =         1
Family:                           Gaussian                     avg =       4.8
Correlation:                  unstructured                     max =         6
                                                Wald chi2(3)       =     94.13
Scale parameter:                  25.87029      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     dep |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |   .3399185   .1326684      2.562   0.010       .0798932    .5999437
   group |  -4.134413   .9986306     -4.140   0.000      -6.091693   -2.177133
   visit |  -1.228327   .1492831     -8.228   0.000      -1.520916   -.9357372
   _cons |   11.13045   2.892903      3.848   0.000       5.460464    16.80044
------------------------------------------------------------------------------

estat wcorr

Estimated within-subj correlation matrix R:

        c1      c2      c3      c4      c5      c6
r1  1.0000
r2  0.4955  1.0000
r3  0.3477  0.8622  1.0000
r4  0.3012  0.7359  0.6677  1.0000
r5  0.2328  0.7431  0.7394  0.7701  1.0000
r6  0.0943  0.5671  0.5625  0.6166  0.7179  1.0000

Now, let's try a different correlation structure, auto regressive with lag one. This is the correlational structure that is most likely to be correct considering the repeated measures over time

xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1)

GEE population-averaged model                   Number of obs      =       287
Group and time vars:            subj visit      Number of groups   =        53
Link:                             identity      Obs per group: min =         2
Family:                           Gaussian                     avg =       5.4
Correlation:                         AR(1)                     max =         6
                                                Wald chi2(3)       =     64.55
Scale parameter:                  25.82413      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     dep |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |   .4268002   .1376156      3.101   0.002       .1570785    .6965219
   group |  -4.218194   1.053504     -4.004   0.000      -6.283023   -2.153364
   visit |  -1.181975   .1907298     -6.197   0.000      -1.555799   -.8081517
   _cons |   9.037864   3.036076      2.977   0.003       3.087264    14.98846
------------------------------------------------------------------------------

estat wcorr

Estimated within-subj correlation matrix R:

        c1      c2      c3      c4      c5      c6
r1  1.0000
r2  0.6812  1.0000
r3  0.4641  0.6812  1.0000
r4  0.3161  0.4641  0.6812  1.0000
r5  0.2154  0.3161  0.4641  0.6812  1.0000
r6  0.1467  0.2154  0.3161  0.4641  0.6812  1.000

This analysis probably more closely reflects the correlations among the depression scores over six visits that we observed in our descriptive analysis.

Now, let's back up and reconsider the group by visit interaction. We will try a model with the interaction using the ar1 correlations.

generate gxv = group*visit

xtgee dep pre group visit gxv, fam(gaus) link(iden) i(subj) t(visit) corr(ar1)

GEE population-averaged model                   Number of obs      =       287
Group and time vars:            subj visit      Number of groups   =        53
Link:                             identity      Obs per group: min =         2
Family:                           Gaussian                     avg =       5.4
Correlation:                         AR(1)                     max =         6
                                                Wald chi2(4)       =     64.83
Scale parameter:                  25.81682      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     dep |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |   .4284649   .1377094      3.111   0.002       .1585595    .6983703
   group |   -3.55197   1.654127     -2.147   0.032         -6.794   -.3099395
   visit |  -1.057824   .3044115     -3.475   0.001      -1.654459   -.4611881
     gxv |  -.2040059   .3905217     -0.522   0.601      -.9694144    .5614026
   _cons |   8.606923   3.147897      2.734   0.006       2.437158    14.77669
------------------------------------------------------------------------------

The group by visit interaction still is not significant even though this may be a better approach for testing it. So far we have been treating visit as a continuous variable. Is it possible that our analysis might change if we were to treat visit as a categorical variable, in the way that the anova did? Let's try one more analysis using xi to create dummy variables on-the-fly.

xi: xtgee dep pre group i.visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1)

GEE population-averaged model                   Number of obs      =       287
Group and time vars:            subj visit      Number of groups   =        53
Link:                             identity      Obs per group: min =         2
Family:                           Gaussian                     avg =       5.4
Correlation:                         AR(1)                     max =         6
                                                Wald chi2(7)       =     66.85
Scale parameter:                  25.67071      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
     dep |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |   .4264589   .1372194      3.108   0.002       .1575137    .6954041
   group |  -4.197096   1.050645     -3.995   0.000      -6.256323   -2.137869
Ivisit_2 |   -.964717   .5556079     -1.736   0.083      -2.053689    .1242546
Ivisit_3 |  -2.790063   .7474989     -3.733   0.000      -4.255134   -1.324992
Ivisit_4 |  -3.730425   .8528421     -4.374   0.000      -5.401964   -2.058885
Ivisit_5 |  -5.127078   .9147959     -5.605   0.000      -6.920045   -3.334111
Ivisit_6 |   -5.84916   .9534054     -6.135   0.000        -7.7178    -3.98052
   _cons |   7.896145   2.998003      2.634   0.008       2.020168    13.77212
------------------------------------------------------------------------------

test Ivisit_2 Ivisit_3 Ivisit_4 Ivisit_5 Ivisit_6 

 ( 1)  Ivisit_2 = 0.0
 ( 2)  Ivisit_3 = 0.0
 ( 3)  Ivisit_4 = 0.0
 ( 4)  Ivisit_5 = 0.0
 ( 5)  Ivisit_6 = 0.0

           chi2(  5) =   40.56
         Prob > chi2 =    0.0000

We can test to see whether the categorical version of visit accounts for more variability that the continuous version by including both in the model but using only k - 2 = 4 dummy variables for time

xi: xtgee dep pre group visit i.visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1)

GEE population-averaged model                   Number of obs      =       287
Group and time vars:            subj visit      Number of groups   =        53
Link:                             identity      Obs per group: min =         2
Family:                           Gaussian                     avg =       5.4
Correlation:                         AR(1)                     max =         6
                                                Wald chi2(7)       =     66.85
Scale parameter:                  25.67071      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
         dep |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         pre |   .4264589   .1372194     3.11   0.002     .1575137    .6954041
       group |  -4.197096   1.050645    -3.99   0.000    -6.256323   -2.137869
       visit |  -1.169832   .1906811    -6.14   0.000     -1.54356   -.7961039
   _Ivisit_2 |    .205115   .5196299     0.39   0.693    -.8133408    1.223571
   _Ivisit_3 |  -.4503992    .648481    -0.69   0.487    -1.721399    .8206003
   _Ivisit_4 |  -.2209286   .6602134    -0.33   0.738    -1.514923    1.073066
   _Ivisit_5 |  -.4477498   .5585628    -0.80   0.423    -1.542513    .6470131
       _cons |   9.065977   3.031614     2.99   0.003     3.124124    15.00783
------------------------------------------------------------------------------

test _Ivisit_2 _Ivisit_3 _Ivisit_4 _Ivisit_5

 ( 1)  _Ivisit_2 = 0
 ( 2)  _Ivisit_3 = 0
 ( 3)  _Ivisit_4 = 0
 ( 4)  _Ivisit_5 = 0

           chi2(  4) =    1.92
         Prob > chi2 =    0.7506

These results indicate that the categorical version of visit does not account for significantly more variability than the continuous version. In the final analysis, I think that I prefer the following model, xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1), of all the analyses run so far. Those results looked as follows:

------------------------------------------------------------------------------	
     dep |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     pre |   .4268002   .1376156      3.101   0.002       .1570785    .6965219
   group |  -4.218194   1.053504     -4.004   0.000      -6.283023   -2.153364
   visit |  -1.181975   .1907298     -6.197   0.000      -1.555799   -.8081517
   _cons |   9.037864   3.036076      2.977   0.003       3.087264    14.98846
------------------------------------------------------------------------------

The final interpretation of these results indicate that there is a significant effect for the pretest, i.e., for evey one point increase in the pretest score there is about a 0.4 increase in the depression score, when controlling for treatment and visit. There is also an effect for the estrogen patch when controlling for pretest depression and visit. Use of the estrogen patch reduces the depression score by 4.2 point. Finally, there is also a significant visit effect when controlling for pretest depression and group membership. The depression score decreases on the average by 1.18 points for each visit.

GEE with Binary Response Variable

The binary response variable in these examples was created from the data from the 1996 Gregoire, Kumar Everitt, Henderson & Studd study on the efficacy of estrogen patches in treating postnatal depression. Women were randomly assigned to either a placebo control group (group=0, n=27) or estrogen patch group (group=1, n=34). Prior to the first treatment all patients took the Edinburgh Postnatal Depression Scale (EPDS). EPDS data was collected monthly for six months once the treatment began. Depression scores greater than or equal to 11 were coded as 1.
use http://www.ats.ucla.edu/stat/stata/library/depres01, clear 

We will go through as series of analyses pretty much paralleling models that were run above using the continuous response variable. To get a binary logit type model we will set family to binary and link to logit. We will start with the correlation structure independent follow by exchangable (compound symmetry) and then unstructured.

xtgee depressd group visit, i(subj) fam(bin) link(logit) corr(ind)

GEE population-averaged model                   Number of obs      =       295
Group variable:                       subj      Number of groups   =        61
Link:                                logit      Obs per group: min =         1
Family:                           binomial                     avg =       4.8
Correlation:                   independent                     max =         6
                                                Wald chi2(2)       =     52.54
Scale parameter:                         1      Prob > chi2        =    0.0000

Pearson chi2(295):                  295.72      Deviance           =    338.95
Dispersion (Pearson):              1.00245      Dispersion         =  1.148974

------------------------------------------------------------------------------
    depressd |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |  -1.606602    .277919    -5.78   0.000    -2.151313   -1.061891
       visit |  -.4402142   .0802387    -5.49   0.000    -.5974791   -.2829493
       _cons |    2.38366   .3675414     6.49   0.000     1.663292    3.104028
------------------------------------------------------------------------------

stat wcorr

Estimated within-subj correlation matrix R:

      |        c1         c2         c3         c4         c5         c6
------+------------------------------------------------------------------
   r1 |         1                                                       
   r2 |         0          1                                            
   r3 |         0          0          1                                 
   r4 |         0          0          0          1                      
   r5 |         0          0          0          0          1           
   r6 |         0          0          0          0          0          1

xtgee depressd group visit, i(subj) fam(bin) link(logit) corr(exc)

GEE population-averaged model                   Number of obs      =       295
Group variable:                       subj      Number of groups   =        61
Link:                                logit      Obs per group: min =         1
Family:                           binomial                     avg =       4.8
Correlation:                  exchangeable                     max =         6
                                                Wald chi2(2)       =     45.64
Scale parameter:                         1      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
depressd |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   group |  -1.616323   .4669082     -3.462   0.001      -2.531446   -.7011994
   visit |  -.3984038   .0613331     -6.496   0.000      -.5186145   -.2781931
   _cons |   2.409522   .4456646      5.407   0.000       1.536035    3.283008
------------------------------------------------------------------------------

estat wcorr

Estimated within-subj correlation matrix R:

        c1      c2      c3      c4      c5      c6
r1  1.0000
r2  0.4518  1.0000
r3  0.4518  0.4518  1.0000
r4  0.4518  0.4518  0.4518  1.0000
r5  0.4518  0.4518  0.4518  0.4518  1.0000
r6  0.4518  0.4518  0.4518  0.4518  0.4518  1.0000

xtgee depressd group visit, i(subj) t(visit) fam(bin) link(logit) corr(unstr)
GEE population-averaged model                   Number of obs      =       295
Group and time vars:            subj visit      Number of groups   =        61
Link:                                logit      Obs per group: min =         1
Family:                           binomial                     avg =       4.8
Correlation:                  unstructured                     max =         6
                                                Wald chi2(2)       =     32.57
Scale parameter:                         1      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
    depressd |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       group |    -1.5933   .4553165    -3.50   0.000    -2.485704   -.7008963
       visit |  -.3897561   .0748284    -5.21   0.000    -.5364169   -.2430952
       _cons |   2.311344   .4521761     5.11   0.000     1.425095    3.197593
------------------------------------------------------------------------------

estat wcorr

Estimated within-subj correlation matrix R:

      |        c1         c2         c3         c4         c5         c6
------+------------------------------------------------------------------
   r1 |         1                                                       
   r2 |   .404501          1                                            
   r3 |  .1803076   .6315383          1                                 
   r4 |   .284646   .5602217   .5795466          1                               

With these data, just as with the continnuous response variable, it might be more reasonable to hypothesize that the correlation structure would be autoregressive.

xtgee depressd group visit, i(subj) t(visit) fam(bin) link(logit) corr(ar1)
note:  some groups have fewer than 2 observations
       not possible to estimate correlations for those groups
       8 groups omitted from estimation

GEE population-averaged model                   Number of obs      =       287
Group and time vars:            subj visit      Number of groups   =        53
Link:                                logit      Obs per group: min =         2
Family:                           binomial                     avg =       5.4
Correlation:                         AR(1)                     max =         6
                                                Wald chi2(2)       =     26.04
Scale parameter:                         1      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
depressd |      Coef.   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   group |  -1.588712   .4391128     -3.618   0.000      -2.449358   -.7280672
   visit |  -.4036122   .0933711     -4.323   0.000      -.5866163   -.2206082
   _cons |   2.259702   .4961409      4.555   0.000       1.287284     3.23212
------------------------------------------------------------------------------

estat wcorr

Estimated within-subj correlation matrix R:

        c1      c2      c3      c4      c5      c6
r1  1.0000
r2  0.5643  1.0000
r3  0.3185  0.5643  1.0000
r4  0.1797  0.3185  0.5643  1.0000
r5  0.1014  0.1797  0.3185  0.5643  1.0000
r6  0.0572  0.1014  0.1797  0.3185  0.5643  1.000

If we want, we can also obtain the results in the odds ratio metric using the eform option.

xtgee, eform
note:  some groups have fewer than 2 observations
       not possible to estimate correlations for those groups
       8 groups omitted from estimation


GEE population-averaged model                   Number of obs      =       287
Group and time vars:            subj visit      Number of groups   =        53
Link:                                logit      Obs per group: min =         2
Family:                           binomial                     avg =       5.4
Correlation:                         AR(1)                     max =         6
                                                Wald chi2(2)       =     26.04
Scale parameter:                         1      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
depressd | Odds Ratio   Std. Err.       z     P>|z|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   group |   .2041883   .0896617     -3.618   0.000        .086349    .4828413
   visit |   .6679031   .0623629     -4.323   0.000       .5562061    .8020309
------------------------------------------------------------------------------

Let's add in the pretest and a group by visit interaction.

xtgee depressd pre group visit gxv, i(subj) t(visit) fam(bin) link(logit) corr(ar1)

note:  some groups have fewer than 2 observations
       not possible to estimate correlations for those groups
       8 groups omitted from estimation

GEE population-averaged model                   Number of obs      =       287
Group and time vars:            subj visit      Number of groups   =        53
Link:                                logit      Obs per group: min =         2
Family:                           binomial                     avg =       5.4
Correlation:                         AR(1)                     max =         6
                                                Wald chi2(4)       =     29.71
Scale parameter:                         1      Prob > chi2        =    0.0000

------------------------------------------------------------------------------
    depressd |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         pre |   .1231682   .0565583     2.18   0.029      .012316    .2340204
       group |  -1.278468   .7833482    -1.63   0.103    -2.813802    .2568666
       visit |  -.3504923   .1484459    -2.36   0.018    -.6414409   -.0595436
         gxv |  -.1279848   .1946883    -0.66   0.511    -.5095669    .2535973
       _cons |  -.4669354   1.271484    -0.37   0.713    -2.958999    2.025128
------------------------------------------------------------------------------

Clearly, there is no interaction but we'll stick with the pretest for the moment. Next let's try the categorical version of visit and the model that contains both the categorical and continuous version of visit.

xi: xtgee depressd pre group i.visit, i(subj) fam(bin) link(logit) t(visit) corr(ar1)

note:  some groups have fewer than 2 observations
       not possible to estimate correlations for those groups
       8 groups omitted from estimation

GEE population-averaged model                   Number of obs      =       287
Group and time vars:            subj visit      Number of groups   =        53
Link:                                logit      Obs per group: min =         2
Family:                           binomial                     avg =       5.4
Correlation:                         AR(1)                     max =         6
                                                Wald chi2(7)       =     30.86
Scale parameter:                         1      Prob > chi2        =    0.0001

------------------------------------------------------------------------------
    depressd |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         pre |   .1140311    .056433     2.02   0.043     .0034244    .2246378
       group |  -1.692654   .4377388    -3.87   0.000    -2.550607   -.8347021
   _Ivisit_2 |  -.1751772   .3106588    -0.56   0.573    -.7840573    .4337028
   _Ivisit_3 |  -1.015265   .3915632    -2.59   0.010    -1.782715   -.2478151
   _Ivisit_4 |  -1.108258   .4287682    -2.58   0.010    -1.948628   -.2678878
   _Ivisit_5 |  -1.489162   .4548596    -3.27   0.001    -2.380671    -.597654
   _Ivisit_6 |   -2.14973   .4951443    -4.34   0.000    -3.120195   -1.179265
       _cons |  -.4832614    1.18731    -0.41   0.684    -2.810346    1.843823
------------------------------------------------------------------------------

test _Ivisit_2 _Ivisit_3 _Ivisit_4 _Ivisit_5 _Ivisit_6

 ( 1)  _Ivisit_2 = 0
 ( 2)  _Ivisit_3 = 0
 ( 3)  _Ivisit_4 = 0
 ( 4)  _Ivisit_5 = 0
 ( 5)  _Ivisit_6 = 0

           chi2(  5) =   21.92
          
xi: xtgee depressd pre group visit i.visit, i(subj) fam(bin) link(logit) t(visit) corr(ar1)

note: _Ivisit_6 dropped due to collinearity
note:  some groups have fewer than 2 observations
       not possible to estimate correlations for those groups
       8 groups omitted from estimation

GEE population-averaged model                   Number of obs      =       287
Group and time vars:            subj visit      Number of groups   =        53
Link:                                logit      Obs per group: min =         2
Family:                           binomial                     avg =       5.4
Correlation:                         AR(1)                     max =         6
                                                Wald chi2(7)       =     30.86
Scale parameter:                         1      Prob > chi2        =    0.0001

------------------------------------------------------------------------------
    depressd |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         pre |   .1140311    .056433     2.02   0.043     .0034244    .2246378
       group |  -1.692654   .4377388    -3.87   0.000    -2.550607   -.8347021
       visit |   -.429946   .0990289    -4.34   0.000     -.624039    -.235853
   _Ivisit_2 |   .2547688   .2901423     0.88   0.380    -.3138998    .8234373
   _Ivisit_3 |  -.1553729   .3440849    -0.45   0.652     -.829767    .5190212
   _Ivisit_4 |   .1815801   .3544878     0.51   0.608    -.5132033    .8763635
   _Ivisit_5 |   .2306217   .3201945     0.72   0.471    -.3969481    .8581914
       _cons |  -.0533153   1.201905    -0.04   0.965    -2.409005    2.302375
------------------------------------------------------------------------------

test _Ivisit_2 _Ivisit_3 _Ivisit_4 _Ivisit_5

 ( 1)  _Ivisit_2 = 0
 ( 2)  _Ivisit_3 = 0
 ( 3)  _Ivisit_4 = 0
 ( 4)  _Ivisit_5 = 0

           chi2(  4) =    3.04
         Prob > chi2 =    0.5507

How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.