### Stata FAQ How can I do mediation analysis with the sem command?

The sem command introduced in Stata 12 makes the analysis of mediation models much easier as long as both the dependent variable and the mediator variable are continuous variables.

We will illustrate using the sem command with the hsbdemo dataset. The examples will not demonstrate full mediation, i.e., the effect of the independent variable will not go from being significant to being not significant. Rather, the examples will show partial mediation in which there is a decrease in the direct effect.

If your model contains control variables, i.e., covariates, you must include these in each of the sem equations. Thus, your sem model will look something like this:
sem (MV <- IV CV1 CV2)(DV <- MV IV CV1 CV2)
where DV stands for the dependent variable; IV stands for the independent variable; MV stands for the mediator variable; and CVs stand for the covariates.

#### Simple mediation model

The simplest mediation model had one IV, one MV and a DV. Here is the symbolic version of the model.
sem (MV <- IV)(DV <- MV IV)
In our simple mediation example the independent variable is math, the mediator variable is read and the dependent variable is science.
use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

Endogenous variables

Exogenous variables

Observed:  math

Fitting target model:

Iteration 0:   log likelihood = -2098.5822
Iteration 1:   log likelihood = -2098.5822

Structural equation model                       Number of obs      =       200
Estimation method  = ml
Log likelihood     = -2098.5822

------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |    .724807   .0579824    12.50   0.000     .6111636    .8384504
_cons |   14.07254   3.100201     4.54   0.000     7.996255    20.14882
-----------+----------------------------------------------------------------
science <- |
read |   .3654205   .0658305     5.55   0.000     .2363951    .4944459
math |   .4017207   .0720457     5.58   0.000     .2605138    .5429276
_cons |    11.6155   3.031268     3.83   0.000     5.674324    17.55668
-------------+----------------------------------------------------------------
Variance     |
e.read |   58.71925   5.871925                      48.26811    71.43329
e.science |    50.8938    5.08938                      41.83548    61.91346
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .
We follow up the sem command with estat teffects to get the direct and indirect effects.
estat teffects

Direct effects
------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |    .724807   .0579824    12.50   0.000     .6111636    .8384504
-----------+----------------------------------------------------------------
science <- |
read |   .3654205   .0658305     5.55   0.000     .2363951    .4944459
math |   .4017207   .0720457     5.58   0.000     .2605138    .5429276
------------------------------------------------------------------------------

Indirect effects
------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |          0  (no path)
-----------+----------------------------------------------------------------
science <- |
math |   .2648593   .0522072     5.07   0.000     .1625351    .3671836
------------------------------------------------------------------------------

Total effects
------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |    .724807   .0579824    12.50   0.000     .6111636    .8384504
-----------+----------------------------------------------------------------
science <- |
read |   .3654205   .0658305     5.55   0.000     .2363951    .4944459
math |     .66658     .05799    11.49   0.000     .5529217    .7802384
------------------------------------------------------------------------------
The total effect for math, .66658, is the effect we would find if there was no mediator in our model. It is significant with a z of 11.49. The direct effect for math is .4017207 which while still significant (z = 5.58) is much smaller than the total effect. The indirect effect of math that passes through read is .2648593 and is also statistically significant.

It is often easier to interpret these values by computing ratios and proportions as shown below.
proportion of total effect mediated = .2648593/.66658 = .3973406

ratio of indirect to direct effect = .2648593/.4017207 = .65931205

ratio of total to direct effect =  .66658/.4017207 =  1.6593121
We see above that the proportion of the total effect that is mediated is almost .40 which is a respectable amount. The ratio of the indirect effect to the direct effect is about .66 or almost 2/3 the size of the direct effect. And finally, the total effect is about 1.66 times the direct effect.

#### Mediation with bootstrap standard errors and confidence intervals

If you are uncomfortable with the standard errors and confidence intervals produced directly by sem, you can obtain the bootstrapped standard errors and confidence intervals by writing a small program that runs both the sem command and the estat teffects and then bootstrapping this program. Here is the program that we a calling indireff.ado.
program indireff, rclass
estat teffects
mat bi = r(indirect)
mat bd = r(direct)
mat bt = r(total)
return scalar indir  = el(bi,1,3)
return scalar direct = el(bd,1,3)
return scalar total  = el(bt,1,3)
end
So how do we know which elements of r(indirect), r(direct) and r(total) we need? We will quietly run the sem and estat teffects commands followed by a matrix list the matrices of the coefficients.
sem (read <- math)(science <- read math)
quietly estat teffects

matrix list r(indirect)

r(indirect)[1,3]
o.         o.
r1          0          0  .26485934

matrix list r(direct)

r(direct)[1,3]
r1  .72480697  .36542052  .40172068

matrix list r(total)

r(total)[1,3]
r1  .72480697  .36542052  .66658002
We see that in each case the coefficient of interest is the third element.

Now that w know the correct matrix elements we will run indireff for 200 bootstrap replications, you may want to run more, say 2,000 to 5,000. We will then request the percentile and biased corrected confidence intervals.
set seed 358395

bootstrap r(indir) r(direct) r(total), reps(200): indireff
(running indireff on estimation sample)

Bootstrap replications (200)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100
..................................................   150
..................................................   200

Bootstrap results                               Number of obs      =       200
Replications       =       200

command:  indireff
_bs_1:  r(indir)
_bs_2:  r(direct)
_bs_3:  r(total)

------------------------------------------------------------------------------
|   Observed   Bootstrap                         Normal-based
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 |   .2648593   .0570312     4.64   0.000     .1530803    .3766384
_bs_2 |   .4017207   .0894059     4.49   0.000     .2264882    .5769531
_bs_3 |     .66658   .0645087    10.33   0.000     .5401452    .7930148
------------------------------------------------------------------------------

estat bootstrap, percentile bc

Bootstrap results                               Number of obs      =       200
Replications       =       200

command:  indireff
_bs_1:  r(indir)
_bs_2:  r(direct)
_bs_3:  r(total)

------------------------------------------------------------------------------
|    Observed               Bootstrap
|       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 |   .26485934   .0015946   .05703117    .1617092   .3759365   (P)
|                                       .1590527   .3756405  (BC)
_bs_2 |   .40172068  -.0062057   .08940595    .2342766   .5533719   (P)
|                                       .2376607   .5533719  (BC)
_bs_3 |   .66658002  -.0046111   .06450873    .5428961   .7722391   (P)
|                                       .5370216   .7705485  (BC)
------------------------------------------------------------------------------
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval

#### Mediation with multiple IVs

What if you had multiple independent variables? You just need to have one equation for each IV predicting the mediator variable. Here is the symbolic model.
sem (MV <- IV1)(MV <- IV2)(DV <- MV IV1 IV2)
For our example, we will use math and ses as our independent variables. We will keep the same mediator and dependent variable as before.
sem (read <- math)(read <- ses)(science <- read math ses)

Endogenous variables

Exogenous variables

Observed:  math ses

Fitting target model:

Iteration 0:   log likelihood = -2306.1661
Iteration 1:   log likelihood = -2306.1661

Structural equation model                       Number of obs      =       200
Estimation method  = ml
Log likelihood     = -2306.1661

------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |     .68845    .059519    11.57   0.000     .5717949     .805105
ses |      1.726   .7698566     2.24   0.025     .2171093    3.234892
_cons |   12.43962   3.147394     3.95   0.000     6.270842     18.6084
-----------+----------------------------------------------------------------
science <- |
read |   .3507374   .0663219     5.29   0.000     .2207487     .480726
math |   .3905883   .0721193     5.42   0.000     .2492371    .5319395
ses |   1.033732    .731092     1.41   0.157    -.3991816    2.466647
_cons |   10.84415   3.065166     3.54   0.000     4.836532    16.85176
-------------+----------------------------------------------------------------
Variance     |
e.read |   57.27968   5.727968                      47.08476    69.68202
e.science |   50.39009   5.039009                      41.42142    61.30067
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0)   =      0.00, Prob > chi2 =      .

estat teffects

Direct effects
------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |     .68845    .059519    11.57   0.000     .5717949     .805105
ses |      1.726   .7698566     2.24   0.025     .2171093    3.234892
-----------+----------------------------------------------------------------
science <- |
read |   .3507374   .0663219     5.29   0.000     .2207487     .480726
math |   .3905883   .0721193     5.42   0.000     .2492371    .5319395
ses |   1.033732    .731092     1.41   0.157    -.3991816    2.466647
------------------------------------------------------------------------------

Indirect effects
------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |          0  (no path)
ses |          0  (no path)
-----------+----------------------------------------------------------------
science <- |
math |   .2414651   .0502052     4.81   0.000     .1430647    .3398655
ses |   .6053729   .2932801     2.06   0.039     .0305544    1.180191
------------------------------------------------------------------------------

Total effects
------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |     .68845    .059519    11.57   0.000     .5717949     .805105
ses |      1.726   .7698566     2.24   0.025     .2171093    3.234892
-----------+----------------------------------------------------------------
science <- |
read |   .3507374   .0663219     5.29   0.000     .2207487     .480726
math |   .6320534   .0596004    10.60   0.000     .5152388     .748868
ses |   1.639105   .7709094     2.13   0.033     .1281507     3.15006
------------------------------------------------------------------------------
We note that the indirect effects of both math and ses are significant. Because we have multiple independent variables the computation of the ratios and proportions is a bit more complex.
proportion of total math effect mediated = .2414651/.6320534 = .38203275
proportion of total ses effect mediated = .6053729/1.639105 = .36933137

ratio of math indirect to direct effect = .2414651/.3905883 = .61820874
ratio of ses indirect to direct effect = .6053729/1.033732 = .58561881

ratio of total math to direct effect = .6320534/.3905883 =  1.6182087
ratio of total ses to direct effect = 1.639105/1.033732 =  1.5856189

#### Mediation with multiple mediators

In this section we will consider the case in which there are multiple mediator variables. This time there will be one equation for each mediator variable. The symbolic form of the mode looks like this.
sem (MV1 <- IV)(MV2 <- IV)(DV <- MV1 MV2 IV)
For our example we will use read and write as the mediators. We will go back to a single independent variable, math.
sem (read <- math)(write <- math)(science <- read write math)

Endogenous variables

Exogenous variables

Observed:  math

Fitting target model:

Iteration 0:   log likelihood = -2779.4174
Iteration 1:   log likelihood = -2779.4174

Structural equation model                       Number of obs      =       200
Estimation method  = ml
Log likelihood     = -2779.4174

------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |    .724807   .0579824    12.50   0.000     .6111636    .8384504
_cons |   14.07254   3.100201     4.54   0.000     7.996255    20.14882
-----------+----------------------------------------------------------------
write <-   |
math |   .6247082   .0562757    11.10   0.000     .5144099    .7350065
_cons |   19.88724   3.008947     6.61   0.000     13.98981    25.78467
-----------+----------------------------------------------------------------
science <- |
read |   .3015317   .0679912     4.43   0.000     .1682715     .434792
write |   .2065257   .0700532     2.95   0.003     .0692239    .3438274
math |   .3190094   .0759047     4.20   0.000      .170239    .4677798
_cons |   8.407353   3.160709     2.66   0.008     2.212476    14.60223
-------------+----------------------------------------------------------------
Variance     |
e.read |   58.71925   5.871925                      48.26811    71.43329
e.write |   55.31334   5.531334                      45.46841    67.28993
e.science |   48.77421   4.877421                      40.09314    59.33492
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(1)   =     21.43, Prob > chi2 = 0.0000

estat teffects

Direct effects
------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |    .724807   .0579824    12.50   0.000     .6111636    .8384504
-----------+----------------------------------------------------------------
write <-   |
math |   .6247082   .0562757    11.10   0.000     .5144099    .7350065
-----------+----------------------------------------------------------------
science <- |
read |   .3015317   .0679912     4.43   0.000     .1682715     .434792
write |   .2065257   .0700532     2.95   0.003     .0692239    .3438274
math |   .3190094   .0759047     4.20   0.000      .170239    .4677798
------------------------------------------------------------------------------

Indirect effects
------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |          0  (no path)
-----------+----------------------------------------------------------------
write <-   |
math |          0  (no path)
-----------+----------------------------------------------------------------
science <- |
write |          0  (no path)
math |   .3475706   .0583928     5.95   0.000     .2331229    .4620183
------------------------------------------------------------------------------

Total effects
------------------------------------------------------------------------------
|                 OIM
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |
math |    .724807   .0579824    12.50   0.000     .6111636    .8384504
-----------+----------------------------------------------------------------
write <-   |
math |   .6247082   .0562757    11.10   0.000     .5144099    .7350065
-----------+----------------------------------------------------------------
science <- |
read |   .3015317   .0679912     4.43   0.000     .1682715     .434792
write |   .2065257   .0700532     2.95   0.003     .0692239    .3438274
math |     .66658   .0568622    11.72   0.000     .5551322    .7780279
------------------------------------------------------------------------------
The indirect effect for math, .345706, is the combination of the indirect via read plus the indirect via write. We can compute these indirect paths manually.
indirect via read = .724807*.3015317 = .21855229

indirect via write = .6247082*.2065257 = .1290183

total indirect = .724807*.3015317 + .6247082*.2065257 = .21855229 + .1290183 = .34757059
The last computation shows that the indirect effect given by estat teffects is the combined indirect effect.

We can use the values we just computed to get the ratios and proportions of interest.
proportion of total math effect mediated = .3475706/.66658 = .52142369
proportion of total math effect mediated via read = .21855229/.66658 = .32787106
proportion of total math effect mediated via write = .1290183/.66658 = .19355261

ratio of math indirect to direct effect = .3475706/.3190094 = 1.0895309
ratio of math indirect to direct effect via read = .21855229/.3190094 = .68509671
ratio of math indirect to direct effect via write = .1290183/.3190094 = .40443416

ratio of total math to direct effect = .66658/.3190094 = 2.0895309
Acknowledgements

Thanks to Rose Medeiros for assistance on the bootstrap confidence intervals.

20 Apr 2012, 17 Oct 2011

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.