Stata FAQ
How can I perform mediation with binary variables?

Mediator variables are variables that sit between independent variable and dependent variable and mediate the effect of the IV on the DV. A model with two mediators is shown in the figure below.

Now, what if MV1 and DV were binary variables while MV2 and IV were continuous. In that case the calculation of the indirect effects would require a combination of OLS regression along with either logit or probit models. This web page presents a Stata program, binary_mediation, that can be with multiple mediator variables in any combination of binary or continuous along with either a binary or continuous response variable. You can download binary_mediation by typing findit binary_mediation in Stata's command window and following the instructions.

Different researchers compute indirect effects using different approaches. We will compute indirect effects using the product of coefficients approach. This is fairly straight forward when all the variables are continuous. Having a combination of continuous and binary variables makes things a bit trickier.

David A. Kenny in a paper available from his website (Mediation with Dichotomous Outcomes), recommends rescaling (standardizing) coefficients before computing indirect effects. The reasoning behind this is that in OLS regression the residual variance for the model changes as variables are entered or removed from the regression equation. In logistic or probit regression, on the other hand, the residual variance is fixed. Since the residual is fixed the scaling of the coefficients varies. Computing indirect effects involves multiple models, each with different variables. In order to compare coefficients from one model to Kenny recommends standardizing the coefficients. Coefficients from OLS models are rescaled using the standard deviations of the observed variables. For logit or probit models the rescaling involves the standard deviation of the underlying latent variable for the binary variable. Once the coefficients are rescaled (standardized) the indirect effects van be computed as the product of coefficients. Nathaniel Herr has a very nice diagram on his webpage that illustrates the different scaling that occurs when both the mediator and response variables are binary.

The user written command, binary_mediation, can be used to compute indirect effects using the product of coefficients approach. The program standardizes all the coefficients for OLS, logit and probit models. The results using logit or probit, once standardized, are very similar.

Please note: binary_mediation does not compute standard errors or confidence intervals directly. You will need to use binary_mediation with the bootstrap command to obtain standard errors and confidence intervals.

Example

For this series of example we will use the hsbdemo dataset. We will create a binary mediator hiread by dichotomizing read. We do not recommend dichotomizing continuous variables, we just want to demonstrate the process with one binary mediator. Along with hiread we will use science as a continuous mediator, ses as a continuous predictor and honors as a binary response variable.

The binary_mediation program will detect which variables are continuous and which are binary.
use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

generate hiread=read>=50   /* create binary mediator */

summarize ses hiread science honors   /* descriptive statistics */

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         ses |       200       2.055    .7242914          1          3
      hiread |       200        .585    .4939585          0          1
     science |       200       51.85    9.900891         26         74
      honors |       200        .265    .4424407          0          1

binary_mediation, dv(honors) mv(hiread science) iv(ses)

Logit: hiread on iv (a1 path) 

Logistic regression                               Number of obs   =        200
                                                  LR chi2(1)      =      12.40
                                                  Prob > chi2     =     0.0004
Log likelihood = -129.52516                       Pseudo R2       =     0.0457

------------------------------------------------------------------------------
      hiread |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ses |   .7204026   .2109932     3.41   0.001     .3068636    1.133942
       _cons |  -1.115341   .4465912    -2.50   0.013    -1.990643   -.2400381
------------------------------------------------------------------------------
OLS regression: science on iv (a2 path) 
------------------------------------------------------------------------------
     science |      Coef.   Std. Err.      t    P>|t|                     Beta
-------------+----------------------------------------------------------------
         ses |   3.866564   .9317955     4.15   0.000                 .2828553
       _cons |   43.90421   2.029732    21.63   0.000                        .
------------------------------------------------------------------------------
Logit: dv on iv (c path)

Logistic regression                               Number of obs   =        200
                                                  LR chi2(1)      =       7.34
                                                  Prob > chi2     =     0.0068
Log likelihood = -111.97593                       Pseudo R2       =     0.0317

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ses |   .6185825   .2344357     2.64   0.008      .159097    1.078068
       _cons |  -2.337778   .5417028    -4.32   0.000    -3.399496    -1.27606
------------------------------------------------------------------------------
Logit: dv on mv & iv (b & c' paths)

Logistic regression                               Number of obs   =        200
                                                  LR chi2(3)      =      51.61
                                                  Prob > chi2     =     0.0000
Log likelihood =  -89.83923                       Pseudo R2       =     0.2231

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      hiread |   1.597298   .5332837     3.00   0.003      .552081    2.642515
     science |   .0901672   .0253211     3.56   0.000     .0405389    .1397956
         ses |   .2516925    .266301     0.95   0.345    -.2702479    .7736328
       _cons |  -7.658069   1.456197    -5.26   0.000    -10.51216   -4.803975
------------------------------------------------------------------------------

Indirect effects with binary response variable honors
        indir_1 = .09141282         (hiread, binary)
        indir_2 = .10582395         (science, continuous)
total indirect  = .19723677
 direct effect  = .07639769
  total effect  = .27363446
       c_path   = .23980637
proportion of total effect mediated = .72080384
ratio of indirect to direct effect  = 2.5817112
Binary models use logit regression
By default binary_mediation displays each of the models used in computing the indirect effects. The coefficients in this part of the output are not standardized. Following the raw output is a summary of the direct and indirect effects. For this example, the total indirect effect seems fairly substantial being approximately two and a half times larger than the direct effect. The proportion of the total effect that is mediated is about 0.72 which is also substantial.

The binary_mediation program does not produce any standard errors or confidence intervals on its own. We will use the bootstrap command to obtain a standard errors for the direct and indirect effects along with a 95% percentile confidence intervals. We will demonstrate the process using 500 bootstrap replications but you can set the number to anything you prefer. We recommend the percentile or biased-corrected confidence intervals over normal-based confidence intervals. You can bootstrap any of the effects found in the return list.
quietly bootstrap r(indir_1) r(indir_2) r(tot_ind) r(dir_eff) r(tot_eff), ///
  reps(500): binary_mediation, dv(honors) iv(ses) mv(hiread science)


estat bootstrap, percentile bc

Bootstrap results                               Number of obs      =       200
                                                Replications       =       499

      command:  binary_mediation, dv(honors) iv(ses) mv(hiread science)
        _bs_1:  r(indir_1)
        _bs_2:  r(indir_2)
        _bs_3:  r(tot_ind)
        _bs_4:  r(dir_eff)
        _bs_5:  r(tot_eff)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .09141282  -.0000552   .03717104    .0299178   .1781988   (P)
             |                                       .0333105   .1959342  (BC)
       _bs_2 |   .10582395    .001447   .03999136    .0421071   .1912641   (P)
             |                                       .0443143   .1973525  (BC)
       _bs_3 |   .19723677   .0013918   .05159597     .098798   .3049328   (P)
             |                                        .107806   .3141167  (BC)
       _bs_4 |   .07639769  -.0046966   .07954484   -.0831474   .2288187   (P)
             |                                      -.0747334   .2309053  (BC)
       _bs_5 |   .27363446  -.0033048   .09258509    .0802001   .4406839   (P)
             |                                       .0739526   .4394651  (BC)
------------------------------------------------------------------------------
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval
Note: one or more parameters could not be estimated in 1 bootstrap replicate;
      standard-error estimates include only complete replications.
The bootstrap program encountered one replicate in which it could not estimate the model. We don't know, for sure, exactly what happened but during the resampling process samples with perfect prediction or complete separation can occur. In these cases the coefficients cannot be computed. Since it occurred in only one out of 500 replication we are not worried.

In looking at the bootstrap results, we see that both of the indirect effects appear to be significant (confidence interval does not contain zero) along with the total indirect effect. The direct effect, however, is not statistically significant.

For comparison purposes we will rerun binary_mediation using the probit option along with the diagram option.
binary_mediation, dv(honors) mv(hiread science) iv(ses) probit diagram

Probit: hiread on iv (a1 path) 

Probit regression                                 Number of obs   =        200
                                                  LR chi2(1)      =      12.37
                                                  Prob > chi2     =     0.0004
Log likelihood = -129.54145                       Pseudo R2       =     0.0456

------------------------------------------------------------------------------
      hiread |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ses |   .4437993   .1279512     3.47   0.001     .1930196    .6945791
       _cons |   -.687353   .2744143    -2.50   0.012    -1.225195   -.1495109
------------------------------------------------------------------------------
OLS regression: science on iv (a2 path) 
------------------------------------------------------------------------------
     science |      Coef.   Std. Err.      t    P>|t|                     Beta
-------------+----------------------------------------------------------------
         ses |   3.866564   .9317955     4.15   0.000                 .2828553
       _cons |   43.90421   2.029732    21.63   0.000                        .
------------------------------------------------------------------------------
Probit: dv on iv (c path)

Probit regression                                 Number of obs   =        200
                                                  LR chi2(1)      =       7.05
                                                  Prob > chi2     =     0.0079
Log likelihood = -112.12049                       Pseudo R2       =     0.0305

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ses |   .3500684   .1332785     2.63   0.009     .0888474    .6112894
       _cons |   -1.36609   .3001514    -4.55   0.000    -1.954376   -.7778043
------------------------------------------------------------------------------
Probit: dv on mv & iv (b * c' paths)

Probit regression                                 Number of obs   =        200
                                                  LR chi2(3)      =      50.99
                                                  Prob > chi2     =     0.0000
Log likelihood = -90.149337                       Pseudo R2       =     0.2205

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      hiread |   .8613714   .2822841     3.05   0.002     .3081048    1.414638
     science |   .0510501   .0142893     3.57   0.000     .0230435    .0790566
         ses |   .1156009   .1543426     0.75   0.454     -.186905    .4181068
       _cons |  -4.250488   .7704867    -5.52   0.000    -5.760614   -2.740362
------------------------------------------------------------------------------

Indirect effects with binary response variable honors
        indir_1 = .09911685        (hiread, binary)
        indir_2 = .10883112        (science, continuous)
total indirect  = .20794797
 direct effect  = .06373715
  total effect  = .27168512
       c_path   = .24577434
proportion of total effect mediated = .76540067
ratio of indirect to direct effect  = 3.2625868
Binary models use probit regression

Reference Mediation Diagram

  IV ---  coef c  --- DV

           MV1
         /     \
   coef a1     coef b1
     /             \
  IV --- coef c' --- DV
    \               /
   coef a2     coef b2
        \       /
           MV2
The ratio of indirect to direct effect is larger for this probit example but most of the other values are very similar to the logit results from the first example. Please note that the reference diagram always shows the example of two mediators. The diagram does not change with the number of mediators in the command itself.

References

Kenny, D. A.(2008) Mediation with Dichotomous Outcomes. Retrieved April 23, 2010 from website: http://davidakenny.net/doc/dichmed.pdf .

Kenny, D. A.(2009) Mediation. Retrieved April 23, 2010 from website: http://davidakenny.net/cm/mediate.htm .

Herr, N. A. (undated) Mediation with Dichotomous Outcomes. Retrieved April 18, 2011 from website: http://www.nrhpsych.com/mediation/logmed.html .

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.