Help the Stat Consulting Group by giving a gift

How can I understand a categorical by categorical interaction in logistic regression? (Stata 10 and earlier)

Interactions in logistic regression models can be trickier than interactions in comparable
OLS regression models. This is particularly true when there are covariates in the model
in addition to the categorical predictors. This FAQ page will try to help you
to understand categorical by categorical interactions in logistic regression models
with continuous covariates.
We will use an example dataset, **logit2-2**, that has two binary predictors, **f** and **h**,
and a continuous covariate, **cv1**. In addition, the model will include
**fh** which is the **f** by **h** interaction. We will begin by loading the data,
creating the interaction variable and running the logit model.

use http://www.ats.ucla.edu/stat/data/logit2-2, clear generate fh = f*h logit y f h fh cv1, nologLogistic regression Number of obs = 200 LR chi2(4) = 106.10 Prob > chi2 = 0.0000 Log likelihood = -78.74193 Pseudo R2 = 0.4025 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- f | 2.996118 .7521524 3.98 0.000 1.521926 4.470309 h | 2.390911 .6608498 3.62 0.000 1.09567 3.686153 fh | -2.047755 .8807989 -2.32 0.020 -3.774089 -.3214213 cv1 | .196476 .0328518 5.98 0.000 .1320876 .2608644 _cons | -11.86075 1.895828 -6.26 0.000 -15.5765 -8.144991 ------------------------------------------------------------------------------

As you can see all of the variables in the above model including the interaction term are statistically significant. If this were an OLS regression model we could do a very good job of understanding the interaction using just the coefficients in the model. The situation in logistic regression is more complicated because the effect of the covariate is nonlinear, meaning that the interaction effect can be very different for different values of the covariate. To begin to understand what is going on consider the Table 1 below.

Table 1: Predicted probabilities when cv1=50 h=0 h=1 Dprob LB UB f=0 .1154 .5876 .4722 .2693 .6751

Table 1 contain predicted probabilities, differences in predicted probabilities and the confidence
interval of the difference in predicted probabilities while holding **cv1** at 50.
The first value, .1154, is the predicted probability when **f**=0 and **h**=0.
The second value, .5876, is the predicted probability when **f**=0 and **h**=1.
The third value, .4722, is the difference in probabilities for **f**=0 when **h**
changes from 0 to 1. The next two values are the 95% confidence interval on the difference in
probabilities. If the confidence interval contains zero the difference would not be considered
statistically significant. In our example, the confidence interval does not contain zero. Thus,
for our example, the difference in probabilities is statistically significant.
We obtained all the values for Table 1 using the **prvalue** command, which is part of
**spostado**. **spostado** is a collection of utilities for categorical and non-normal
models written by J. Scott Long and Jeremy Freese. You can obtain the **spostado** utilities
by typing **findit spostado** into the Stata command line and following the instructions
(see How can I use
the findit command to search for programs and get additional help? for more
information about using **findit**).
To get the values for Table 1 we will run **prvalue** twice; once with **f**=0, **h**=0
and once with **f**=0, **h**=1 while holding the covariate at the value 50. The first
time we run **prvalue** we use the **save**
option to retain the first probability. The second time we use the **diff** option so that
we get the difference between the two probabilities.

prvalue, x(f=0 h=0 fh=0 cv1=50) delta savelogit: Predictions for y Confidence intervals by delta method 95% Conf. Interval Pr(y=1|x): 0.1154 [ 0.0027, 0.2281] Pr(y=0|x): 0.8846 [ 0.7719, 0.9973] f h fh cv1 x= 0 0 0 50

prvalue, x(f=0 h=1 fh=0 cv1=50) delta difflogit: Change in Predictions for y Confidence intervals by delta method Current Saved Change 95% CI for Change Pr(y=1|x): 0.5876 0.1154 0.4722 [ 0.2693, 0.6751] Pr(y=0|x): 0.4124 0.8846 -0.4722 [-0.6751, -0.2693] f h fh cv1 Current= 0 1 0 50 Saved= 0 0 0 50 Diff= 0 1 0 0

Next we need to step through a number of combinations of categorical variables and
covariates. The code fragment below will fix the covariate at nine values between 30 to 70 while
looking at differences between **h**=0 and **h**=1 separately for **f**=0 and
for **f**=1.

mat P=J(2,2,.) mat colnames P = h=0 h=1 mat rownames P = f=0 f=1 forvalues i=30(5)70 { capture matrix drop D display display as txt "cv1=`i'" quietly prvalue, x(f=0 h=0 fh=0 cv1=`i') delta save mat P[1,1]=r(p1) quietly prvalue, x(f=0 h=1 fh=0 cv1=`i') delta diff /* h=0 vs h=1 @ f=0 */ mat P[1,2]=r(p1) mat temp=r(pred) mat D=temp[2,1..3] quietly prvalue, x(f=1 h=0 fh=0 cv1=`i') delta save mat P[2,1]=r(p1) /* please note: fh=1 only when both f=1 and h=1 */ quietly prvalue, x(f=1 h=1 fh=1 cv1=`i') delta diff /* h=0 vs h=1 @ f=1 */ mat P[2,2]=r(p1) mat temp=r(pred) mat D = D \ temp[2,1..3] mat R = P,D mat list R, title(cell probabilities, differences and confidence intervals) }

Now, we will run the above code fragment and add annotations to the output manually with
comments in bold.
**<- difference significant at f=0**

cv1=30<- hold covariate at 30R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .00255673 .02723725 .02468052 -.01224754 .06160857<- difference not significant at f=0f=1 .04878354 .06740873 .01862519 -.04638632 .0836367<- difference not significant at f=1cv1=35<- hold covariate at 35R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .00679948 .06957896 .06277948 -.01141145 .13697041<- difference not significant at f=0f=1 .12047193 .16181129 .04133936 -.09505168 .1777304<- difference not significant at f=1cv1=40<- hold covariate at 40R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .0179561 .16647829 .14852219 .01991065 .27713373<- difference significant at f=0f=1 .26784405 .34019339 .07234934 -.1564856 .30118428<- difference not significant at f=1cv1=45<- hold covariate at 45R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .04656038 .34787005 .30130967 .12440741 .47821194<- difference significant at f=0f=1 .49419808 .57931143 .08511335 -.18160469 .35183138<- difference not significant at f=1cv1=50<- hold covariate at 50R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .11537804 .58757877 .47220073 .26931943 .67508203<- difference significant at f=0f=1 .72295588 .78622645 .06327057 -.1399183 .26645944<- difference not significant at f=1cv1=55<- hold covariate at 55R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .25834921 .7918883 .53353909 .29661155 .77046662<- difference significant at f=0f=1 .87452245 .90760261 .03308016 -.07776317 .1439235<- difference not significant at f=1cv1=60<- hold covariate at 60R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .48196125 .91041601 .42845476 .1588636 .69804592<- difference significant at f=0f=1 .94901687 .96328229 .01426542 -.03588886 .0644197<- difference not significant at f=1cv1=65<- hold covariate at 65R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .71303982 .96446669 .25142688 .01497498 .48787877<- difference significant at f=0f=1 .98028207 .98592901 .00564694 -.01520646 .02650035<- difference not significant at f=1cv1=70<- hold covariate at 70R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .86904871 .98639321 .1173445 -.03299262 .26768162<- difference not significant at f=0f=1 .99252504 .99468476 .00215971 -.00622068 .01054011<- difference not significant at f=1

Here is what we can say based upon the output above. There are no significant
differences between the two levels of **h** when the covariate is held constant at either 30 or 35.
When the covariate is held constant between 40 and 65 there is a significant **h** 0-1
difference at **f**=0 but not at **f**=1. Finally, when the covariates are held constant at 70
the **h** differences are not significant. It may be easier to understand these results if we
graph the confidence intervals for the difference in probability separately for both **f**=0 and **f**=1.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.