UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

Stata FAQ
How can I perform mediation with multilevel data? (Method 1)

Mediator variables are variables that sit between the independent variable and dependent variable and mediate the effect of the IV on the DV. A model with one mediator is shown in the figure below.
The idea, in mediation analysis, is that some of the effect of the predictor variable, the IV, is transmitted to the DV through the mediator variable, the MV. And some of the effect of the IV passes directly to the DV. That portion of of the effect of the IV that passes through the MV is the indirect effect. The program ml_mediation (findit ml_mediation) will compute direct and indirect effects for multilevel data. The approach used in ml_mediation was adapted from Krull & MacKinnon (2001).

When you have multilevel data, the variables may come from different levels of the model. The DV will always be a level one variable. Depending on your data, the IV and MV may be either level 1 or level 2 variables. According to Krull & MacKinnon (2001) a predictor variable may be mediated by a variable at the same level or lower. Thus a level 2 mediator may be mediated by a level 2 or level 1 variable. A level 1 predictor may only be mediated by another level 1 variable. Logically, a level 1 predictor cannot affect a level 2 mediator.

ml_mediation computes the indirect effect as the product of coefficients, i.e., indirect effect = coef[a]*coef[b]. When the response varible is at level 1, ml_mediation uses the xtmixed, reml command by default with xtmixed, mle as an option. When the response variable is at level 2, i.e., the MV is level 2, ml_mediation uses the xtreg, be command. The ml_mediation program will detect which variables are level 1 and which are level 2.

The DV and MV must be a continuous variables. The IV may be a continuous or binary predictor variable. While the CVs may be continuous, binary or factor variables.

We will illustrate the use of the ml_mediation command with a simulated multilevel dataset, ml_med.dta.. Let's look at the data.

use http://www.ats.ucla.edu/stat/data/ml_med, clear

summarize, sep(0)   /* descriptive statistics */

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          id |       200       100.5    57.87918          1        200
       write |       200      52.775    9.478586         31         67
       socst |       200      52.405    10.73579         26         71
         cid |       200       10.43    5.801152          1         20
        abil |       200     156.725    25.75063        104        215
   mean_abil |       200     156.725    25.21654   114.0909      205.7
    mean_ses |       200       2.055    .3142828   1.444444   2.727273
         hon |       200        .545    .4992205          0          1
Write, socst, abil and hon are all level 1 variables. Cid is the cluster, level 2, identifier. While hon is a binary variable that indicates membership in the honor society. Abil is a composite measure of academic ability. Now, we are ready to try a multilevel mediation model in which all of the variables are at level 1.
ml_mediation, dv(write) iv(hon) mv(abil) l2id(cid)

Equation 1 (c_path): write = hon 

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -628.62552  
Iteration 1:   log restricted-likelihood = -628.62552  

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =       200
Group variable: cid                             Number of groups   =        20

                                                Obs per group: min =         7
                                                               avg =      10.0
                                                               max =        12


                                                Wald chi2(1)       =     32.80
Log restricted-likelihood = -628.62552          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         hon |   4.138289   .7225934     5.73   0.000     2.722032    5.554546
       _cons |   50.64367    1.84665    27.42   0.000      47.0243    54.26304
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
cid: Identity                |
                   sd(_cons) |    7.91701   1.331807      5.693395    11.00908
-----------------------------+------------------------------------------------
                sd(Residual) |   4.823492   .2549056       4.34889    5.349889
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =   191.99 Prob >= chibar2 = 0.0000

Equation 2 (a_path): abil = hon 

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -659.69204  
Iteration 1:   log restricted-likelihood = -659.69204  

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =       200
Group variable: cid                             Number of groups   =        20

                                                Obs per group: min =         7
                                                               avg =      10.0
                                                               max =        12


                                                Wald chi2(1)       =     31.36
Log restricted-likelihood = -659.69204          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
        abil |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         hon |  -4.265397   .7616216    -5.60   0.000    -5.758148   -2.772647
       _cons |   159.3095   5.751541    27.70   0.000     148.0367    170.5823
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
cid: Identity                |
                   sd(_cons) |   25.60223   4.169551      18.60596    35.22926
-----------------------------+------------------------------------------------
                sd(Residual) |   5.074532   .2681952      4.575188    5.628375
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =   537.80 Prob >= chibar2 = 0.0000

Equation 3 (b_path & c_prime): write = abil hon 

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -528.74216  
Iteration 1:   log restricted-likelihood = -528.74216  

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =       200
Group variable: cid                             Number of groups   =        20

                                                Obs per group: min =         7
                                                               avg =      10.0
                                                               max =        12


                                                Wald chi2(2)       =    665.58
Log restricted-likelihood = -528.74216          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        abil |  -.8056925   .0348556   -23.12   0.000    -.8740083   -.7373768
         hon |    .671848   .3882241     1.73   0.084    -.0890572    1.432753
       _cons |   179.0213   8.446553    21.19   0.000     162.4664    195.5763
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
cid: Identity                |
                   sd(_cons) |   28.44004   4.705583      20.56333    39.33388
-----------------------------+------------------------------------------------
                sd(Residual) |    2.38897   .1268631      2.152825    2.651018
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =   247.90 Prob >= chibar2 = 0.0000

The mediator, abil, is a level 1 variable
c_path  = 4.1382892
a_path  = -4.2653975
b_path  = -.80569254
c_prime = .67184798  same as dir_eff
ind_eff = 3.4365989
dir_eff = .67184798
tot_eff = 4.1084469

proportion of total effect mediated = .83647154
ratio of indirect to direct effect  = 5.1151437
ratio of total to direct effect     = 6.1151437
The output includes the results of three equations: 1) the DV on the IV, 2) the MV on the IV, and 3) the DV on the MV and IV. The direct, indirect and total effects along with various proportions and ratios are shown below the results of the three equations.

We see that hon is significant in equation 1 and is also a significant predictor of the mediator variable, abil, in equation 2. However, hon is not significant in equation 3 when the mediator is included in the model. This suggests that there is mediation. The output includes the indirect, direct and total effects. It does not however include standard errors or confidence intervals. To get these you need to bootstrap the results. You can bootstrap any of the effects found in the return list.

return list

scalars:
            r(tot_eff) =  4.108446903443488
            r(dir_eff) =  .6718479771360948
            r(ind_eff) =  3.436598926307393
             r(b_path) =  -.8056925398919483
             r(a_path) =  -4.265397476273364
             r(c_path) =  4.13828918116252
We will illustrate this by bootstrapping the ml_mediation command with 500 replications (you may want to do more than 500 reps, maybe a lot more).

bootstrap indeff=r(ind_eff) direff=r(dir_eff) toteff=r(tot_eff), ///
  reps(500) strata(cid): ml_mediation, dv(write) iv(hon) mv(abil) l2id(cid)
  
Bootstrap results

Number of strata   =        20                  Number of obs      =       200
                                                Replications       =       500

      command:  ml_mediation, dv(write) iv(hon) mv(abil) l2id(cid)
       indeff:  r(ind_eff)
       direff:  r(dir_eff)
       toteff:  r(tot_eff)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      indeff |   3.436599   .5851906     5.87   0.000     2.289647    4.583551
      direff |    .671848   .3754834     1.79   0.074     -.064086    1.407782
      toteff |   4.108447   .6628692     6.20   0.000     2.809247    5.407647
------------------------------------------------------------------------------
If you have concerns about the normal based confidence confidence intervals, you can obtain percentile or bc confidence intervals with the estat boot command.
estat boot, percentile bc

Bootstrap results
Number of strata   =        20                  Number of obs      =       200
                                                Replications       =       500

      command:  ml_mediation, dv(write) iv(hon) mv(abil) l2id(cid)
       indeff:  r(ind_eff)
       direff:  r(dir_eff)
       toteff:  r(tot_eff)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
      indeff |   3.4365989  -.0041102   .58519055    2.317477   4.625505   (P)
             |                                       2.285803   4.557065  (BC)
      direff |   .67184798   .0295377   .37548343   -.1037632   1.443162   (P)
             |                                      -.1594468   1.383974  (BC)
      toteff |   4.1084469   .0254276   .66286921    2.845113   5.427971   (P)
             |                                       2.723221   5.319005  (BC)
------------------------------------------------------------------------------
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval
Based on the confidence intervals it appears that both the direct and total effects are statistically significant.

References

Krull,J.L. & MacKinnon,D.P. (2001) Multilevel modeling of individual and group level mediated effects. Multivariate Behavioral Research, 36(2), 249-277.


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.