Help the Stat Consulting Group by giving a gift

How can I get margins and marginsplot with multiply imputed data?

The** margins** and **marginsplot ** commands**,** introduced in Stata 11 and Stata 12, respectively,
are very popular post-estimation commands. However, they can be tricky to use in
conjunction with multiple imputation.

Let's begin by looking at the data.

use http://www.ats.ucla.edu/stat/data/hsbmar_s10, clearsum female ses read mathVariable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- female | 185 .5459459 .4992356 0 1 ses | 200 2.055 .7242914 1 3 read | 185 51.61622 10.19104 28 76 math | 190 52.17895 9.246168 33 75

As you can see from the table above, all of
the variables except for **ses** have missing values.

Running **margins** and **marginsplot** after multiple imputation involves
a multi-step process. We will demonstrate this process using an ordered logit model with
**ses** as the response variable. It can take on the values 1, 2 or 3. It's not
a great response variable from a theoretical standpoint, but at least it is
ordinal.
The predictor variables are **female**, **read** and **math**.

Why not just impute the data and run the analyses. Well, we can impute the data, but we
need a way to run both **ologit** and **margins** on each imputed dataset and
then combine the **margins** results into a single output. The problem
is that **margins (**an rclass command**)** does not work with **mi estimate
(**an eclass command). Additionally, since we are
looking for predicted probabilities, we need to compute them for each of the three
response values.

We can accomplish this by writing a wrapper program called **emargins.ado**.
It contains both the **ologit** and **margins** commands. By setting the option
**properties** to **mi**, **emargins** can be used with **mi estimate**.
We also need to declare **emargins** to be an **eclass** program.

Here is what the program looks like.

program emargins, eclass properties(mi) version 12 args outcome ologit ses female read math margins, at(female=(0 1) read=(30(10)70)) atmeans asbalanced /// post predict(outcome(`outcome')) end

The **emargins **program
will run the **ologit** and then estimate **margins** to give
the predicted probabilities for each
level of ses.

The important part to notice is that the program is marked
"eclass" and we have use the "post"
option on the **margins **statement. This is done so that the
predicted probabilities and variance-covariance matrix estimated for each
imputed dataset will be saved
correctly in the **mi** **ereturn** list where **mi estimate**
can access the estimates (not the **return **list where it would normally go).

Here is how you use **emargins** program with **mi estimate**:

mi estimate, cmdok: emargins 1

The **cmdok** is needed because Stata does not recognize **emargins** as an
**mi estimable**
program. The value one (1) after **emargins** is passed to **margins** indicating which
response value is being predicted.

Once **mi estimate** has combined the **margins** from each of the imputed datasets using Rubin's
rules into one table, how do we get the **marginsplot** to run? If you try running **marginsplot**
after **mi estimate** you get the error message, "previous command was not margins." This happens because **mi
estimate** does not leave the results in the right place for **marginsplot** to
find them.

Remember, **mi estimate** is an **eclass **command
saves results in the
**ereturn** list but **margins** is **rclass** and saves
its results in the **return** list. Somehow we need to move the
information from the **mi estimate ereturn** list to the **
margins return** list.

The solution to this problem is to save the combined margins predicted
probabilities **e(b_mi)** and variance-covariance matrix **e(V_mi)**
produced by **mi estimate **into matrices **b** and **V**, run
a standard **margins** on the
**_mi_m == 0** (non-imputed) data, and then repost the results from **b** and **V** back into
the** margins return** list
**r(b)** and **r(V) **where **marginsplot **can access them. We do the last part with a program called **myret.ado** which
looks like this.

program myret, rclass return add return matrix b = b return matrix V= V end

Now putting all of the pieces together into a do-file we get...

set seed 1234543 mi set mlong mi register imputed female math read science socst mi impute mvn female math read science socst = /// ses write awards, add(10) * this is to get the ologit coefficients and standard errors mi estimate: ologit ses female read math * loop once for each of the response values of ses forvalues i=1/3 { mi estimate, cmdok: emargins `i' // emargins is defined above mat b= e(b_mi) // save mi point estimates mat V = e(V_mi) // save mi vce * run ologit and margins on the _mi_m==0 data quietly ologit ses female read math if _mi_m == 0 quietly margins, at(female=(0 1) read=(30(10)70)) /// atmeans asbalanced predict(outcome(`i'))

* myret puts b and V matrix in a place marginsplot can find myret

*Technically we ran the program myret between margins and marginsplot. *E(cmd) is the eclass scalar that tells Stata what the previous command was. *So we have to set that to "margins" for marginsplot to work correctly *(becaues marginsplot expects margins as the last estimation command). mata: st_global("e(cmd)", "margins") // set previous cmd to margins

marginsplot, x(read) recast(line) noci name(ologit`i', replace) }

Here is what the output looks like when we run the do-file.

mi register imputed female math read science socst(51 m=0 obs. now marked as incomplete)mi impute mvn female math read science socst = /// ses write awards, add(10)Performing EM optimization: observed log likelihood = -1814.7997 at iteration 10 Performing MCMC data augmentation ... Multivariate imputation Imputations = 10 Multivariate normal regression added = 10 Imputed: m=1 through m=10 updated = 0 Prior: uniform Iterations = 1000 burn-in = 100 between = 100 ------------------------------------------------------------------ | Observations per m |---------------------------------------------- Variable | Complete Incomplete Imputed | Total -------------------+-----------------------------------+---------- female | 185 15 15 | 200 math | 190 10 10 | 200 read | 185 15 15 | 200 science | 193 7 7 | 200 socst | 188 12 12 | 200 ------------------------------------------------------------------ (complete + incomplete = total; imputed is the minimum across m of the number of filled-in observations.)mi estimate: ologit ses female read mathMultiple-imputation estimates Imputations = 10 Ordered logistic regression Number of obs = 200 Average RVI = 0.0652 Largest FMI = 0.1530 DF adjustment: Large sample DF: min = 406.39 avg = 14401.83 max = 34720.30 Model F test: Equal FMI F( 3, 2142.2) = 6.04 Within VCE type: OIM Prob > F = 0.0004 ------------------------------------------------------------------------------ ses | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | -.256023 .2937578 -0.87 0.384 -.8334975 .3214516 read | .0465224 .0184505 2.52 0.012 .0103068 .0827379 math | .0215837 .0195972 1.10 0.271 -.0168578 .0600251 -------------+---------------------------------------------------------------- /cut1 | 2.140064 .8673955 2.47 0.014 .439941 3.840188 /cut2 | 4.399381 .9168621 4.80 0.000 2.602301 6.19646 ------------------------------------------------------------------------------forvalues i=1/3 { mi estimate, cmdok: emargins `i' mat b= e(b_mi) mat V = e(V_mi) quietly ologit ses female read math if _mi_m == 0 quietly margins, at(female=(0 1) read=(30(10)70)) /// atmeans asbalanced predict(outcome(`i')) myret mata: st_global("e(cmd)", "margins") marginsplot, x(read) recast(line) noci name(ologit`i', replace) }Multiple-imputation estimates Imputations = 10 Adjusted predictions Number of obs = 200 Average RVI = 0.0459 Largest FMI = 0.1608 DF adjustment: Large sample DF: min = 368.44 avg = 48442.19 Within VCE type: Delta-method max = 387639.72 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | .403343 .1067354 3.78 0.000 .1941287 .6125573 2 | .2979526 .0638882 4.66 0.000 .1727337 .4231715 3 | .2105536 .0398869 5.28 0.000 .1323426 .2887646 4 | .1438415 .0369006 3.90 0.000 .0713076 .2163754 5 | .0959678 .0378763 2.53 0.012 .021487 .1704485 6 | .4662736 .1129099 4.13 0.000 .244356 .6881912 7 | .354441 .069965 5.07 0.000 .2169542 .4919277 8 | .2562266 .0415667 6.16 0.000 .1747025 .3377507 9 | .177796 .0385076 4.62 0.000 .1023212 .2532708 10 | .1197154 .0420447 2.85 0.004 .0372978 .2021331 ------------------------------------------------------------------------------ Variables that uniquely identify margins: female read Multiple-imputation estimates Imputations = 10 Adjusted predictions Number of obs = 200 Average RVI = 0.0421 Largest FMI = 0.1330 DF adjustment: Large sample DF: min = 534.65 avg = 154724.03 Within VCE type: Delta-method max = 809285.40 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | .4625686 .0627646 7.37 0.000 .3395491 .5855881 2 | .5045423 .0397542 12.69 0.000 .4266251 .5824594 3 | .5078671 .0380308 13.35 0.000 .4333279 .5824063 4 | .4717337 .0430318 10.96 0.000 .3873677 .5560997 5 | .4056562 .0677409 5.99 0.000 .2725853 .5387272 6 | .4257974 .0733559 5.80 0.000 .2817618 .569833 7 | .4849265 .043836 11.06 0.000 .3989895 .5708636 8 | .5108419 .0378766 13.49 0.000 .4366048 .5850789 9 | .4964919 .0396086 12.53 0.000 .4188596 .5741242 10 | .4455013 .0584992 7.62 0.000 .3308417 .5601608 ------------------------------------------------------------------------------ Variables that uniquely identify margins: female read Multiple-imputation estimates Imputations = 10 Adjusted predictions Number of obs = 200 Average RVI = 0.0449 Largest FMI = 0.1638 DF adjustment: Large sample DF: min = 355.39 avg = 21683.99 Within VCE type: Delta-method max = 125538.87 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | .1340885 .0542208 2.47 0.013 .0278035 .2403734 2 | .1975051 .0502487 3.93 0.000 .0990184 .2959918 3 | .2815793 .0457423 6.16 0.000 .1919123 .3712463 4 | .3844248 .0624612 6.15 0.000 .2616825 .5071671 5 | .498376 .0984841 5.06 0.000 .3046912 .6920608 6 | .107929 .0471437 2.29 0.023 .0152658 .2005922 7 | .1606325 .0449048 3.58 0.000 .0723513 .2489137 8 | .2329315 .0404386 5.76 0.000 .1535627 .3123004 9 | .325712 .0535693 6.08 0.000 .2207159 .4307082 10 | .4347833 .0904882 4.80 0.000 .2574145 .6121521 ------------------------------------------------------------------------------ Variables that uniquely identify margins: female read

Please note: The values in the tables and graphs above are predicted probabilities. The column heading for the margins tables, Coef., is incorrect.

In case you lose track of which values in thematrix list r(at)r(at)[10,3] female read math 1._at 0 30 51.691358 2._at 0 40 51.691358 3._at 0 50 51.691358 4._at 0 60 51.691358 5._at 0 70 51.691358 6._at 1 30 51.691358 7._at 1 40 51.691358 8._at 1 50 51.691358 9._at 1 60 51.691358 10._at 1 70 51.691358

This may not be the most transparent process ever but, in the end, we got the plots
of the predicted probabilities. Of course, the technique shown here is not restricted
to **ologit** but generalizes to many other estimation procedures for use with
**margins** and **marginsplot**.

Thanks to Isabel Cannette of Stata Corp for the suggestion to use **myret** to repost
the **margins** results.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.