Help the Stat Consulting Group by giving a gift

How can I understand a categorical by categorical interaction in logistic regression? (Stata 10 and earlier)

We will use an example dataset, **logit2-2**, that has two binary predictors, **f** and **h**,
and a continuous covariate, **cv1**. In addition, the model will include
**fh** which is the **f** by **h** interaction. We will begin by loading the data,
creating the interaction variable and running the logit model.

As you can see all of the variables in the above model including the interaction term are statistically significant. If this were an OLS regression model we could do a very good job of understanding the interaction using just the coefficients in the model. The situation in logistic regression is more complicated because the effect of the covariate is nonlinear, meaning that the interaction effect can be very different for different values of the covariate. To begin to understand what is going on consider the Table 1 below.use http://www.ats.ucla.edu/stat/data/logit2-2, clear generate fh = f*h logit y f h fh cv1, nologLogistic regression Number of obs = 200 LR chi2(4) = 106.10 Prob > chi2 = 0.0000 Log likelihood = -78.74193 Pseudo R2 = 0.4025 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- f | 2.996118 .7521524 3.98 0.000 1.521926 4.470309 h | 2.390911 .6608498 3.62 0.000 1.09567 3.686153 fh | -2.047755 .8807989 -2.32 0.020 -3.774089 -.3214213 cv1 | .196476 .0328518 5.98 0.000 .1320876 .2608644 _cons | -11.86075 1.895828 -6.26 0.000 -15.5765 -8.144991 ------------------------------------------------------------------------------

Table 1 contain predicted probabilities, differences in predicted probabilities and the confidence interval of the difference in predicted probabilities while holdingTable 1: Predicted probabilities when cv1=50 h=0 h=1 Dprob LB UB f=0 .1154 .5876 .4722 .2693 .6751

We obtained all the values for Table 1 using the **prvalue** command, which is part of
**spostado**. **spostado** is a collection of utilities for categorical and non-normal
models written by J. Scott Long and Jeremy Freese. You can obtain the **spostado** utilities
by typing **findit spostado** into the Stata command line and following the instructions
(see How can I use
the findit command to search for programs and get additional help? for more
information about using **findit**).

To get the values for Table 1 we will run **prvalue** twice; once with **f**=0, **h**=0
and once with **f**=0, **h**=1 while holding the covariate at the value 50. The first
time we run **prvalue** we use the **save**
option to retain the first probability. The second time we use the **diff** option so that
we get the difference between the two probabilities.

prvalue, x(f=0 h=0 fh=0 cv1=50) delta savelogit: Predictions for y Confidence intervals by delta method 95% Conf. Interval Pr(y=1|x): 0.1154 [ 0.0027, 0.2281] Pr(y=0|x): 0.8846 [ 0.7719, 0.9973] f h fh cv1 x= 0 0 0 50

Next we need to step through a number of combinations of categorical variables and covariates. The code fragment below will fix the covariate at nine values between 30 to 70 while looking at differences betweenprvalue, x(f=0 h=1 fh=0 cv1=50 cv2=50) delta difflogit: Change in Predictions for y Confidence intervals by delta method Current Saved Change 95% CI for Change Pr(y=1|x): 0.5876 0.1154 0.4722 [ 0.2693, 0.6751] Pr(y=0|x): 0.4124 0.8846 -0.4722 [-0.6751, -0.2693] f h fh cv1 Current= 0 1 0 50 Saved= 0 0 0 50 Diff= 0 1 0 0

Now, we will run the above code fragment and add annotations to the output manually with comments in bold.mat P=J(2,2,.) mat colnames P = h=0 h=1 mat rownames P = f=0 f=1 forvalues i=30(5)70 { capture matrix drop D display display as txt "cv1=`i'" quietly prvalue, x(f=0 h=0 fh=0 cv1=`i') delta save mat P[1,1]=r(p1) quietly prvalue, x(f=0 h=1 fh=0 cv1=`i') delta diff /* h=0 vs h=1 @ f=0 */ mat P[1,2]=r(p1) mat temp=r(pred) mat D=temp[2,1..3] quietly prvalue, x(f=1 h=0 fh=0 cv1=`i') delta save mat P[2,1]=r(p1) /* please note: fh=1 only when both f=1 and h=1 */ quietly prvalue, x(f=1 h=1 fh=1 cv1=`i') delta diff /* h=0 vs h=1 @ f=1 */ mat P[2,2]=r(p1) mat temp=r(pred) mat D = D \ temp[2,1..3] mat R = P,D mat list R, title(cell probabilities, differences and confidence intervals) }

Here is what we can say based upon the output above. There are no significant differences between the two levels ofcv1=30<- hold covariate at 30R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .00255673 .02723725 .02468052 -.01224754 .06160857<- difference not significant at f=0f=1 .04878354 .06740873 .01862519 -.04638632 .0836367<- difference not significant at f=1cv1=35<- hold covariate at 35R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .00679948 .06957896 .06277948 -.01141145 .13697041<- difference not significant at f=0f=1 .12047193 .16181129 .04133936 -.09505168 .1777304<- difference not significant at f=1cv1=40<- hold covariate at 40R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .0179561 .16647829 .14852219 .01991065 .27713373<- difference significant at f=0f=1 .26784405 .34019339 .07234934 -.1564856 .30118428<- difference not significant at f=1cv1=45<- hold covariate at 45R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .04656038 .34787005 .30130967 .12440741 .47821194<- difference significant at f=0f=1 .49419808 .57931143 .08511335 -.18160469 .35183138<- difference not significant at f=1cv1=50<- hold covariate at 50R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .11537804 .58757877 .47220073 .26931943 .67508203<- difference significant at f=0f=1 .72295588 .78622645 .06327057 -.1399183 .26645944<- difference not significant at f=1cv1=55<- hold covariate at 55R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .25834921 .7918883 .53353909 .29661155 .77046662<- difference significant at f=0f=1 .87452245 .90760261 .03308016 -.07776317 .1439235<- difference not significant at f=1cv1=60<- hold covariate at 60R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .48196125 .91041601 .42845476 .1588636 .69804592<- difference significant at f=0f=1 .94901687 .96328229 .01426542 -.03588886 .0644197<- difference not significant at f=1cv1=65<- hold covariate at 65R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .71303982 .96446669 .25142688 .01497498 .48787877<- difference significant at f=0f=1 .98028207 .98592901 .00564694 -.01520646 .02650035<- difference not significant at f=1cv1=70<- hold covariate at 70R[2,5]: cell probabilities, differences and confidence intervals h=0 h=1 Dprob LB UB f=0 .86904871 .98639321 .1173445 -.03299262 .26768162<- difference not significant at f=0f=1 .99252504 .99468476 .00215971 -.00622068 .01054011<- difference not significant at f=1

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.