### Stata FAQ How can I perform mediation with binary variables?

Mediator variables are variables that sit between independent variable and dependent variable and mediate the effect of the IV on the DV. A model with two mediators is shown in the figure below.

Now, what if MV1 and DV were binary variables while MV2 and IV were continuous. In that case the calculation of the indirect effects would require a combination of OLS regression along with either logit or probit models. This web page presents a Stata program, binary_mediation, that can be with multiple mediator variables in any combination of binary or continuous along with either a binary or continuous response variable. You can download binary_mediation by typing findit binary_mediation in Stata's command window and following the instructions.

Different researchers compute indirect effects using different approaches. We will compute indirect effects using the product of coefficients approach. This is fairly straight forward when all the variables are continuous. Having a combination of continuous and binary variables makes things a bit trickier.

David A. Kenny in a paper available from his website (Mediation with Dichotomous Outcomes), recommends rescaling (standardizing) coefficients before computing indirect effects. The reasoning behind this is that in OLS regression the residual variance for the model changes as variables are entered or removed from the regression equation. In logistic or probit regression, on the other hand, the residual variance is fixed. Since the residual is fixed the scaling of the coefficients varies. Computing indirect effects involves multiple models, each with different variables. In order to compare coefficients from one model to Kenny recommends standardizing the coefficients. Coefficients from OLS models are rescaled using the standard deviations of the observed variables. For logit or probit models the rescaling involves the standard deviation of the underlying latent variable for the binary variable. Once the coefficients are rescaled (standardized) the indirect effects van be computed as the product of coefficients. Nathaniel Herr has a very nice diagram on his webpage that illustrates the different scaling that occurs when both the mediator and response variables are binary.

The user written command, binary_mediation, can be used to compute indirect effects using the product of coefficients approach. The program standardizes all the coefficients for OLS, logit and probit models. The results using logit or probit, once standardized, are very similar.

Please note: binary_mediation does not compute standard errors or confidence intervals directly. You will need to use binary_mediation with the bootstrap command to obtain standard errors and confidence intervals.

#### Example

For this series of example we will use the hsbdemo dataset. We will create a binary mediator hiread by dichotomizing read. We do not recommend dichotomizing continuous variables, we just want to demonstrate the process with one binary mediator. Along with hiread we will use science as a continuous mediator, ses as a continuous predictor and honors as a binary response variable.

The binary_mediation program will detect which variables are continuous and which are binary.
use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

summarize ses hiread science honors   /* descriptive statistics */

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
ses |       200       2.055    .7242914          1          3
hiread |       200        .585    .4939585          0          1
science |       200       51.85    9.900891         26         74
honors |       200        .265    .4424407          0          1

Logit: hiread on iv (a1 path)

Logistic regression                               Number of obs   =        200
LR chi2(1)      =      12.40
Prob > chi2     =     0.0004
Log likelihood = -129.52516                       Pseudo R2       =     0.0457

------------------------------------------------------------------------------
hiread |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ses |   .7204026   .2109932     3.41   0.001     .3068636    1.133942
_cons |  -1.115341   .4465912    -2.50   0.013    -1.990643   -.2400381
------------------------------------------------------------------------------
OLS regression: science on iv (a2 path)
------------------------------------------------------------------------------
science |      Coef.   Std. Err.      t    P>|t|                     Beta
-------------+----------------------------------------------------------------
ses |   3.866564   .9317955     4.15   0.000                 .2828553
_cons |   43.90421   2.029732    21.63   0.000                        .
------------------------------------------------------------------------------
Logit: dv on iv (c path)

Logistic regression                               Number of obs   =        200
LR chi2(1)      =       7.34
Prob > chi2     =     0.0068
Log likelihood = -111.97593                       Pseudo R2       =     0.0317

------------------------------------------------------------------------------
honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ses |   .6185825   .2344357     2.64   0.008      .159097    1.078068
_cons |  -2.337778   .5417028    -4.32   0.000    -3.399496    -1.27606
------------------------------------------------------------------------------
Logit: dv on mv & iv (b & c' paths)

Logistic regression                               Number of obs   =        200
LR chi2(3)      =      51.61
Prob > chi2     =     0.0000
Log likelihood =  -89.83923                       Pseudo R2       =     0.2231

------------------------------------------------------------------------------
honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
hiread |   1.597298   .5332837     3.00   0.003      .552081    2.642515
science |   .0901672   .0253211     3.56   0.000     .0405389    .1397956
ses |   .2516925    .266301     0.95   0.345    -.2702479    .7736328
_cons |  -7.658069   1.456197    -5.26   0.000    -10.51216   -4.803975
------------------------------------------------------------------------------

Indirect effects with binary response variable honors
indir_2 = .10582395         (science, continuous)
total indirect  = .19723677
direct effect  = .07639769
total effect  = .27363446
c_path   = .23980637
proportion of total effect mediated = .72080384
ratio of indirect to direct effect  = 2.5817112
Binary models use logit regression
By default binary_mediation displays each of the models used in computing the indirect effects. The coefficients in this part of the output are not standardized. Following the raw output is a summary of the direct and indirect effects. For this example, the total indirect effect seems fairly substantial being approximately two and a half times larger than the direct effect. The proportion of the total effect that is mediated is about 0.72 which is also substantial.

The binary_mediation program does not produce any standard errors or confidence intervals on its own. We will use the bootstrap command to obtain a standard errors for the direct and indirect effects along with a 95% percentile confidence intervals. We will demonstrate the process using 500 bootstrap replications but you can set the number to anything you prefer. We recommend the percentile or biased-corrected confidence intervals over normal-based confidence intervals. You can bootstrap any of the effects found in the return list.
quietly bootstrap r(indir_1) r(indir_2) r(tot_ind) r(dir_eff) r(tot_eff), ///
reps(500): binary_mediation, dv(honors) iv(ses) mv(hiread science)

estat bootstrap, percentile bc

Bootstrap results                               Number of obs      =       200
Replications       =       499

command:  binary_mediation, dv(honors) iv(ses) mv(hiread science)
_bs_1:  r(indir_1)
_bs_2:  r(indir_2)
_bs_3:  r(tot_ind)
_bs_4:  r(dir_eff)
_bs_5:  r(tot_eff)

------------------------------------------------------------------------------
|    Observed               Bootstrap
|       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 |   .09141282  -.0000552   .03717104    .0299178   .1781988   (P)
|                                       .0333105   .1959342  (BC)
_bs_2 |   .10582395    .001447   .03999136    .0421071   .1912641   (P)
|                                       .0443143   .1973525  (BC)
_bs_3 |   .19723677   .0013918   .05159597     .098798   .3049328   (P)
|                                        .107806   .3141167  (BC)
_bs_4 |   .07639769  -.0046966   .07954484   -.0831474   .2288187   (P)
|                                      -.0747334   .2309053  (BC)
_bs_5 |   .27363446  -.0033048   .09258509    .0802001   .4406839   (P)
|                                       .0739526   .4394651  (BC)
------------------------------------------------------------------------------
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval
Note: one or more parameters could not be estimated in 1 bootstrap replicate;
standard-error estimates include only complete replications.
The bootstrap program encountered one replicate in which it could not estimate the model. We don't know, for sure, exactly what happened but during the resampling process samples with perfect prediction or complete separation can occur. In these cases the coefficients cannot be computed. Since it occurred in only one out of 500 replication we are not worried.

In looking at the bootstrap results, we see that both of the indirect effects appear to be significant (confidence interval does not contain zero) along with the total indirect effect. The direct effect, however, is not statistically significant.

For comparison purposes we will rerun binary_mediation using the probit option along with the diagram option.
binary_mediation, dv(honors) mv(hiread science) iv(ses) probit diagram

Probit: hiread on iv (a1 path)

Probit regression                                 Number of obs   =        200
LR chi2(1)      =      12.37
Prob > chi2     =     0.0004
Log likelihood = -129.54145                       Pseudo R2       =     0.0456

------------------------------------------------------------------------------
hiread |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ses |   .4437993   .1279512     3.47   0.001     .1930196    .6945791
_cons |   -.687353   .2744143    -2.50   0.012    -1.225195   -.1495109
------------------------------------------------------------------------------
OLS regression: science on iv (a2 path)
------------------------------------------------------------------------------
science |      Coef.   Std. Err.      t    P>|t|                     Beta
-------------+----------------------------------------------------------------
ses |   3.866564   .9317955     4.15   0.000                 .2828553
_cons |   43.90421   2.029732    21.63   0.000                        .
------------------------------------------------------------------------------
Probit: dv on iv (c path)

Probit regression                                 Number of obs   =        200
LR chi2(1)      =       7.05
Prob > chi2     =     0.0079
Log likelihood = -112.12049                       Pseudo R2       =     0.0305

------------------------------------------------------------------------------
honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ses |   .3500684   .1332785     2.63   0.009     .0888474    .6112894
_cons |   -1.36609   .3001514    -4.55   0.000    -1.954376   -.7778043
------------------------------------------------------------------------------
Probit: dv on mv & iv (b * c' paths)

Probit regression                                 Number of obs   =        200
LR chi2(3)      =      50.99
Prob > chi2     =     0.0000
Log likelihood = -90.149337                       Pseudo R2       =     0.2205

------------------------------------------------------------------------------
honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
hiread |   .8613714   .2822841     3.05   0.002     .3081048    1.414638
science |   .0510501   .0142893     3.57   0.000     .0230435    .0790566
ses |   .1156009   .1543426     0.75   0.454     -.186905    .4181068
_cons |  -4.250488   .7704867    -5.52   0.000    -5.760614   -2.740362
------------------------------------------------------------------------------

Indirect effects with binary response variable honors
indir_2 = .10883112        (science, continuous)
total indirect  = .20794797
direct effect  = .06373715
total effect  = .27168512
c_path   = .24577434
proportion of total effect mediated = .76540067
ratio of indirect to direct effect  = 3.2625868
Binary models use probit regression

Reference Mediation Diagram

IV ---  coef c  --- DV

MV1
/     \
coef a1     coef b1
/             \
IV --- coef c' --- DV
\               /
coef a2     coef b2
\       /
MV2
The ratio of indirect to direct effect is larger for this probit example but most of the other values are very similar to the logit results from the first example. Please note that the reference diagram always shows the example of two mediators. The diagram does not change with the number of mediators in the command itself.

#### References

Kenny, D. A.(2008) Mediation with Dichotomous Outcomes. Retrieved April 23, 2010 from website: http://davidakenny.net/doc/dichmed.pdf .

Kenny, D. A.(2009) Mediation. Retrieved April 23, 2010 from website: http://davidakenny.net/cm/mediate.htm .

Herr, N. A. (undated) Mediation with Dichotomous Outcomes. Retrieved April 18, 2011 from website: http://www.nrhpsych.com/mediation/logmed.html .

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.