### Statistical Computing Seminars Deciphering Interactions in Logistic Regression

• This presentation presents a broad overview of methods for interpreting interactions in logistic regression.
• The presentation is not about Stata. It uses Stata, but you gotta use something.
• The methods shown are somewhat stat package independent. However, they can be easier or more difficult to implement depending on the stat package.
• The presentation is not a step-by-step how-to manual that shows all of the code that was used to produce the results shown.
• Each of the models used in the examples will have two research variables that are interacted and one continuous covariate (cv1) that is not part of the interaction.

#### Some Definitions

Odds

Showing that odds are ratios.

     odds = p/(1 - p)
Log Odds

Natural log of the odds, also known as a logit.

     log odds = logit = log(p/(1 - p))
Odds Ratio

Showing that odds ratios are actually ratios of ratios.

                  odds1     p1/(1 - p1)
odds_ratio = ----- = -------------
odds2     p2/(1 - p2)
Computing Odds Ratio from Logistic Regression Coefficient
     odds_ratio = exp(b)
Computing Probability from Logistic Regression Coefficients
     probability = exp(Xb)/(1 + exp(Xb))
Where Xb is the linear predictor.

Logistic regression fits a maximum likelihood logit model. The model estimates conditional means in terms of logits (log odds). The logit model is a linear model in the log odds metric. Logistic regression results can be displayed as odds ratios or as probabilities. Probabilities are a nonlinear transformation of the log odds results.

In general, linear models have a number of advantages over nonlinear models and are easier to work with. For example, in linear models the slopes and/or differences in means do not change for differing values of a covariate. This is not necessarily the case for nonlinear models. The problem in logistic regression is that, even though the model is linear in log odds, many researchers feel that log odds are not a natural metric and are not easily interpreted.

Probability is a much more natural metric. However, the logit model is not linear when working in the probability metric. Thus, the predicted probabilities change as the values of a covariate change. In fact, the estimated probabilities depend on all variables in the model not just the variables in the interaction.

So what is a linear model? A linear model is linear in the betas (coefficients). By extension, a nonlinear model must be nonlinear in the betas. Below are three example of linear and nonlinear models.

First, is an example of a linear model and its graph.

Next we have an example of a nonlinear model and its graph. In this case its an exponential growth model.

Lastly we have another nonlinear model. This one shows the nonlinear transformation of log odds to probabilities.

Logistic Regression Transformations

This is an attempt to show the different types of transformations that can occur with logistic regression models.


probability
/ \
/   \
/     \
/       \
/         \
/           \
odds ratios ----- log odds ------- odds

#### Logistic interactions are a complex concept

Common wisdom suggests that interactions involves exploring differences in differences. If the differences are not different then there is no interaction. But in logistic regression interaction is a more complex concept. Researchers need to decide on how to conceptualize the interaction. Is the interaction to be conceptualized in terms of log odds (logits) or odds ratios or probability? This decision can make a big difference. An interaction that is significant in log odds may not be significant in terms of difference in differences for probability. Or vice versa.

#### Log odds metric -- categorical by categorical interaction

Variables f and h are binary predictors, while cv1 is a continuous covariate.
logit y01 f##h cv1, nolog

Logistic regression                               Number of obs   =        200
LR chi2(4)      =     106.10
Prob > chi2     =     0.0000
Log likelihood =  -78.74193                       Pseudo R2       =     0.4025

------------------------------------------------------------------------------
y01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.f |   2.996118   .7521524     3.98   0.000     1.521926    4.470309
1.h |   2.390911   .6608498     3.62   0.000      1.09567    3.686153
|
f#h |
1 1  |  -2.047755   .8807989    -2.32   0.020    -3.774089   -.3214213
|
cv1 |    .196476   .0328518     5.98   0.000     .1320876    .2608644
_cons |  -11.86075   1.895828    -6.26   0.000     -15.5765   -8.144991
------------------------------------------------------------------------------
The interaction term is clearly significant. We could manually compute the expected logits for each of the four cells in the model.
     f h
cell 0 0 b[_cons]                                 = -11.86075
cell 0 1 b[_cons] + b[1.f] = -11.86075 + 2.390911 =  -9.469835
cell 1 0 b[_cons] + b[1.h] = -11.86075 + 2.996118 =  -8.864629
cell 1 1 b[_cons] + b[1.f] + b[1.h] + b[1.f#1.h]
= -11.86075 + 2.390911 + 2.996118 - 2.047755 =  -8.521473

We can also use a cell-means model to obtain the expected logits for each cell when cv1=0.

logit y01 bn.f#bn.h cv1, nocons

Logistic regression                               Number of obs   =        200
Wald chi2(5)    =      50.48
Log likelihood =  -78.74193                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
y01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
f#h |
0 0  |  -11.86075   1.895828    -6.26   0.000     -15.5765   -8.144991
0 1  |  -9.469835   1.714828    -5.52   0.000    -12.83084   -6.108835
1 0  |  -8.864629   1.530269    -5.79   0.000     -11.8639   -5.865356
1 1  |  -8.521473   1.640705    -5.19   0.000    -11.73719    -5.30575
|
cv1 |    .196476   .0328518     5.98   0.000     .1320876    .2608644
------------------------------------------------------------------------------
And here is what the expected logits look like in a 2x2 table.
 h=0 h=1 f=0 -11.86075 -9.469835 f=1 -8.8646295 -8.521473
We will look at the differences between h0 and h1 at each level of f (simple main effects) and also at the difference in differences.
/* difference 1 at f = 0 */

lincom 0.f#0.h - 0.f#1.h

( 1)  [y01]0bn.f#0bn.h - [y01]0bn.f#1.h = 0

------------------------------------------------------------------------------
y01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) |  -2.390911   .6608498    -3.62   0.000    -3.686153    -1.09567
------------------------------------------------------------------------------

/* difference 2 at f = 1 */

lincom 1.f#0.h - 1.f#1.h

( 1)  [y01]1.f#0bn.h - [y01]1.f#1.h = 0

------------------------------------------------------------------------------
y01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) |  -.3431562   .5507722    -0.62   0.533     -1.42265    .7363375
------------------------------------------------------------------------------
Difference 1 suggests that h0 is significantly different from h1 at f = 0, While difference 2 does not show a significant difference at f = 1. These are tests of simple main effects just like we would do in OLS regression. We will finish up this section by looking at the difference in differences.
/* difference in differences */

lincom (0.f#0.h - 0.f#1.h)-(1.f#0.h - 1.f#1.h)

( 1)  [y01]0bn.f#0bn.h - [y01]0bn.f#1.h - [y01]1.f#0bn.h + [y01]1.f#1.h = 0

------------------------------------------------------------------------------
y01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) |  -2.047755   .8807989    -2.32   0.020    -3.774089   -.3214213
------------------------------------------------------------------------------
The difference in differences is, of course, just another name for the interaction. For the log odds model the differences and the difference in differences are the same regardless of the value of the covariate. This constancy across different values of the covariate is one of the properties of linear models.

#### Odds ratio metric -- categorical by categorical interaction

Let's look at a table of logistic regression coefficients along with the exponentiated coefficients, which some people call odds ratios.

----------------------------------------------------------
source  |  coefficient  exp(coef)    type of exp(coef)
--------+-------------------------------------------------
f       |   2.996118    20.007716    odds ratio
h       |   2.390911    10.92345     odds ratio
f#h     |  -2.047755     0.1290242   ratio of odds ratios
cv1     |   0.196476     1.217106    odds ratio
_cons   | -11.86075      7.062e-06   baseline odds
---------------------------------------------------------
Many people call all exponentiated logistic coefficients odds ratios. But as you can see from the table above, exponentiating the interaction is a ratio of ratios and the exponentiated constant is the baseline odds.

We can compute the odds ratios manually for each of the two levels of f from the values in the table above.

odds ratio h1/h0 for f=0:  b[1.h]                            = 10.92345
odds ratio h1/h0 for f=1:  b[1.h]*b[f#h] = 10.92345*.1290242 = 1.4093894

Please note that the computation of the odds ratio for f =1 involves multiplying coefficients for the odds ratio model above which implies that odds ratio models are multiplicative rather than additive.

The baseline odds when cv1 = zero is very small (7.06e-06) so for the remainder of of the computations we will estimate the odds while holding cv1 at 50.

margins, over(f h) at(cv1=50) expression(exp(xb())) noatlegend

Predictive margins                                Number of obs   =        200
Model VCE    : OIM

Expression   : exp(xb())
over         : f h

------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
f h  |
0 0  |   .1304264   .0734908     1.77   0.076    -.0136129    .2744657
0 1  |   1.424706    .515989     2.76   0.006     .4133857    2.436025
1 0  |   2.609533   1.136545     2.30   0.022     .3819457    4.837121
1 1  |   3.677847   1.311463     2.80   0.005     1.107427    6.248267
------------------------------------------------------------------------------
The option expression(exp(xb())) insures that we are looking at results in the odds ratio metric. The baseline odds are now .1304264 which is reasonable. We will compute the odds ratio for each level of f.
odds ratio 1 at f=0: 1.424706/.1304264 = 10.923446

odds ratio 2 at f=1: 3.677847/2.609533 =  1.4093889
So when f = 0 the odds of the outcome being one are 10.92 times greater for h1 then for h0. For f = 1 the ratio of the two odds is only 1.41. These odds ratios are the same as we computed manually earlier.

We can also compute the ratio of odds ratios and show that it reproduces the estimate for the interaction.

ratio of odds ratios: (3.677847/2.609533)/(1.424706/.1304264) = .1290242
The one nice thing that we can say about working in odds ratio metric is the odds ratios remain the same regardless of where we hold the covariate constant.

#### Probability metric -- categorical by categorical interaction

We will begin by rerunning our logistic regression model to refresh our memories on the coefficients.
logit y01 f##h cv1, nolog

Logistic regression                               Number of obs   =        200
LR chi2(4)      =     106.10
Prob > chi2     =     0.0000
Log likelihood =  -78.74193                       Pseudo R2       =     0.4025

------------------------------------------------------------------------------
y01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.f |   2.996118   .7521524     3.98   0.000     1.521926    4.470309
1.h |   2.390911   .6608498     3.62   0.000      1.09567    3.686153
|
f#h |
1 1  |  -2.047755   .8807989    -2.32   0.020    -3.774089   -.3214213
|
cv1 |    .196476   .0328518     5.98   0.000     .1320876    .2608644
_cons |  -11.86075   1.895828    -6.26   0.000     -15.5765   -8.144991
------------------------------------------------------------------------------

Let's manually compute the probability of the outcome being one for the f = 0, h = 0 cell when cv1 is held at 50.
Xb = b[_cons]  + 0*b[1.f]   + 0*b[1.h]   + 0*b{f#h}    + 50*b[cv1]

= -11.86075 + 0*2.996118 + 0*2.390911 + 0*-2.047755 + 50*.196476 = -2.03695

probability = exp(Xb)/(1+exp(Xb)) = exp(-2.03695)/(1+exp(-2.03695)) = .11537767
We could repeat this for each of the other three cells but instead we we will obtain the expected probabilities for each cell while holding the covariate at 50 using the margins command.
margins f#h, at(cv1=50)

Adjusted predictions                              Number of obs   =        200
Model VCE    : OIM

Expression   : Pr(y01), predict()
at           : cv1             =          50

------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
f#h |
0 0  |    .115378   .0575106     2.01   0.045     .0026592    .2280968
0 1  |   .5875788   .0877652     6.69   0.000     .4155621    .7595955
1 0  |   .7229559   .0872338     8.29   0.000     .5519808    .8939309
1 1  |   .7862264   .0599327    13.12   0.000     .6687605    .9036924
------------------------------------------------------------------------------
Here are the same results displayed as a table.

 h=0 h=1 f=0 .115378 .5875788 f=1 .7229559 .7862264

We would like to look at the differences in h for each level of f.

h1 - h0 at f = 0: .5875788 - .115378  = .4722008
h1 - h0 at f = 1: .7862264 - .7229559 = .0632706
We can also do this with a slight variation of the margins command and get estimates of the differences in probability along with standard errors and confidence intervals.
margins f, dydx(h) at(cv1=50) post

Conditional marginal effects                      Number of obs   =        200
Model VCE    : OIM

Expression   : Pr(y01), predict()
dy/dx w.r.t. : 0.h 1.h
at           : cv1             =          50

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.h          |
f |
0  |   .4722008   .1035128     4.56   0.000     .2693195     .675082
1  |   .0632706   .1036697     0.61   0.542    -.1399183    .2664595
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
These two differences are the probability analogs to the simple main effects from the log odds model. So, when the covariate is held at 50 there is a significant difference in h at f = 0 but not at f = 1.

Next, we will use lincom to compute the difference in differences when cv1 is held at 50.

lincom [1.h]0.f-[1.h]1.f

( 1)  [1.h]0bn.f - [1.h]1.f = 0

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) |   .4089302   .1482533     2.76   0.006      .118359    .6995014
------------------------------------------------------------------------------
The p-value here is different form the p-value from the original logit model because in the probability metric the values of the covariate matter.

If we repeat the above process for values of cv1 from 20 to 70, we can produce a table of simple main effects and a graph of the difference in differences.

Table of Simple Main Effects for h at Two Levels of f for Various Values of cv1

|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cv1 f  |
20 0  |   .0035507   .0038256     0.93   0.353    -.0039472    .0110487
20 1  |    .002893   .0057719     0.50   0.616    -.0084197    .0142058
30 0  |   .0246805   .0188412     1.31   0.190    -.0122475    .0616086
30 1  |   .0186252   .0331697     0.56   0.574    -.0463863    .0836367
40 0  |   .1485222   .0656193     2.26   0.024     .0199107    .2771337
40 1  |   .0723494   .1167547     0.62   0.535    -.1564856    .3011843
50 0  |   .4722008   .1035128     4.56   0.000     .2693195     .675082
50 1  |   .0632706   .1036697     0.61   0.542    -.1399183    .2664595
60 0  |   .4284548    .137549     3.11   0.002     .1588636    .6980459
60 1  |   .0142654   .0255894     0.56   0.577    -.0358888    .0644197
70 0  |   .1173445    .076704     1.53   0.126    -.0329926    .2676816
70 1  |   .0021597   .0042758     0.51   0.613    -.0062207    .0105402

Table of Difference in Differences for Various Values of cv1

|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cv1   |
20   |   .0006577   .0047463     0.14   0.890    -.0086449    .0099603
30   |   .0060553   .0306291     0.20   0.843    -.0539766    .0660872
40   |   .0761728   .1233778     0.62   0.537    -.1656432    .3179889
50   |   .4089302   .1482533     2.76   0.006      .118359    .6995014
60   |   .4141893   .1388141     2.98   0.003     .1421186      .68626
70   |   .1151848   .0753487     1.53   0.126    -.0324959    .2628654


Clearly, the value of the covariate makes a huge difference in whether or not the simple main effects or the interactions are statistically significant when working in the probability metric.

#### Model 1a: Categorical by categorical interaction?

But wait, what if the model does not contain an interaction term? Consider the following model.
logit y01 i.f i.h cv1

Logistic regression                               Number of obs   =        200
LR chi2(3)      =     100.26
Prob > chi2     =     0.0000
Log likelihood =   -81.6618                       Pseudo R2       =     0.3804

------------------------------------------------------------------------------
y01 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.f |    1.65172   .4229992     3.90   0.000     .8226566    2.480783
1.h |   1.256555   .4009757     3.13   0.002     .4706575    2.042453
cv1 |   .1806214   .0304036     5.94   0.000     .1210314    .2402113
_cons |  -10.26943   1.622842    -6.33   0.000    -13.45015   -7.088723
------------------------------------------------------------------------------
We will manually compute the expected log odds for each of the four cells of the model.
     f h
cell 0 0 b[_cons]                                 = -10.26943
cell 0 1 b[_cons] + b[1.f] = -10.26943 + 1.65172  = -8.61771
cell 1 0 b[_cons] + b[1.h] = -10.26943 + 1.256555 = -9.012875
cell 1 1 b[_cons] + b[1.f] + b[1.h]
= -10.26943 + 1.65172 + 1.256555 = -7.361155
Next we will compute the differences for f=0 and f=1.
difference 1 at f = 0: -10.26943 - -8.6177  = -1.65173

difference 2 at f = 1: 9.012875 - -7.361155 = -1.65172
They are identical to within rounding error, showing that there is no interaction effect in the log odds model.

Next we will compute the expected probabilities for cv1 held at 50 along with the difference in differences.

margins, over(f h) at(cv1=50) post

Predictive margins                                Number of obs   =        200

------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
f#h |
0 0  |   .2247204   .0670438     3.35   0.001     .0933171    .3561238
0 1  |   .5045471   .0798579     6.32   0.000     .3480285    .6610657
1 0  |   .6018917   .0866773     6.94   0.000     .4320073    .7717761
1 1  |   .8415636   .0455686    18.47   0.000     .7522509    .9308764
------------------------------------------------------------------------------

lincom (_b[0.f#1.h]-_b[0.f#0.h])-(_b[1.f#1.h]-_b[1.f#0.h])

( 1)  - 0bn.f#0bn.h + 0bn.f#1.h + 1.f#0bn.h - 1.f#1.h = 0

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) |   .0401547   .0364121     1.10   0.270    -.0312117     .111521
------------------------------------------------------------------------------
The difference in differences is not very large. Let's try in again, this time holding cv1 at 60.
margins, over(f h) at(cv1=60) post

Predictive margins                                Number of obs   =        200

------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
f#h |
0 0  |   .6382663   .1046912     6.10   0.000     .4330753    .8434572
0 1  |   .8610935   .0455552    18.90   0.000     .7718069    .9503802
1 0  |   .9019929   .0470231    19.18   0.000     .8098294    .9941565
1 1  |   .9700007   .0146765    66.09   0.000     .9412353     .998766
------------------------------------------------------------------------------

lincom (_b[0.f#1.h]-_b[0.f#0.h])-(_b[1.f#1.h]-_b[1.f#0.h])

( 1)  - 0bn.f#0bn.h + 0bn.f#1.h + 1.f#0bn.h - 1.f#1.h = 0

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) |   .1548195   .0634635     2.44   0.015     .0304334    .2792057
------------------------------------------------------------------------------
This time the difference in differences is much larger. Let's make a graph similar to the one we did for the model with the interaction included.

We see that, even without an interaction term in the model, the differences in differences (interactions?) can vary widely from negative to positive depending on the value of the covariate.

This leads us to the "Quote of the Day."

#### Quote of the day

Departures from additivity imply the presence of interaction types, but additivity does not imply the absence of interaction types.

Greenland & Rothman, 1998

#### Log odds metric -- categorical by continuous interaction

The dataset for the categorical by continuous interaction has one binary predictor (f), one continuous predictor (s) and a continuous covariate (cv1). Let's take a look at the logistic regression model.
logit y f##c.s cv1

Logistic regression                               Number of obs   =        200
LR chi2(4)      =     114.41
Prob > chi2     =     0.0000
Log likelihood = -74.587842                       Pseudo R2       =     0.4340

------------------------------------------------------------------------------
y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.f |   9.983662    3.05269     3.27   0.001       4.0005    15.96682
s |   .1750686   .0470033     3.72   0.000     .0829438    .2671933
|
f#c.s |
1  |  -.1595233   .0570352    -2.80   0.005    -.2713103   -.0477363
|
cv1 |   .1877164   .0347888     5.40   0.000     .1195316    .2559013
_cons |  -19.00557   3.371064    -5.64   0.000    -25.61273   -12.39841
------------------------------------------------------------------------------
The interaction term is significant indicating the the slopes for y on s are significantly different for each level of f. We can compute the slopes and intercepts manually as shown below.

slope for f=0:  b[s] = .1750686
slope for f=1:  b[s] + b[f#c.s] = .1750686 -.1595233 = .0155453

intercept for f=0:  _cons = -19.00557
intercept for f=1:  _cons + b[1.f]= -19.00557 + 9.983662 = -9.021909
Here are our two logistic regression equations in the log odds metric.
-19.00557 + .1750686*s + 0*cv1
-9.021909 + .0155453*s + 0*cv1
Now we can graph these two regression lines to get an idea of what is going on.

Because the logistic regress model is linear in log odds, the predicted slopes do not change with differing values of the covariate.

#### Probability metric -- categorical by continuous interaction

We'll begin by rerunning the logistic regression model.
logit y f##c.s cv1

Logistic regression                               Number of obs   =        200
LR chi2(4)      =     114.41
Prob > chi2     =     0.0000
Log likelihood = -74.587842                       Pseudo R2       =     0.4340

------------------------------------------------------------------------------
y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.f |   9.983662    3.05269     3.27   0.001       4.0005    15.96682
s |   .1750686   .0470033     3.72   0.000     .0829438    .2671933
|
f#c.s |
1  |  -.1595233   .0570352    -2.80   0.005    -.2713103   -.0477363
|
cv1 |   .1877164   .0347888     5.40   0.000     .1195316    .2559013
_cons |  -19.00557   3.371064    -5.64   0.000    -25.61273   -12.39841
------------------------------------------------------------------------------
If we were so inclined we could compute all of the probabilities of interest using the basic probability formula.
Prob = exp(Xb)/(1+exp(Xb))
Here's an example of computing the probability when f = 0, s = 60, f#s = 0, and cv1 =40.
Xb0 = -19.00557 + 0*9.983662 + 60*.1750686 + 0*-.1595233 + 40*.1877164 = -.992798

exp(Xb0)/(1+exp(Xb0)) = exp(-.992798)/(1+exp(-.992798)) = .27035977
Now we will use f = 1, s = 60, f#s = 60, and cv1 =40.
Xb1 = -19.00557 + 1*9.983662 + 60*.1750686 + 60*-.1595233 + 40*.1877164 = -.580534

exp(Xb1)/(1+exp(Xb1)) = exp(-.580534)/(1+exp(-.580534)) = .35880973
We can also compute the difference in probabilities.
exp(Xb1)/(1+exp(Xb1))           - exp(Xb0)/(1+exp(Xb0)) =

exp(-.580534)/(1+exp(-.580534)) - exp(-.992798)/(1+exp(-.992798)) = .08844995
If we use something like Stata's margins command, we can get predicted probabilities along with standard errors and confidence intervals. Here is an example predicting the probability when s = 20 and cv1 = 40.
margins f, at(s=20 cv1=40)

Adjusted predictions                              Number of obs   =        200
Model VCE    : OIM

Expression   : Pr(y), predict()

------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
f |
0  |   .0003368   .0005779     0.58   0.560    -.0007958    .0014695
1  |   .2310582   .1500289     1.54   0.124    -.0629931    .5251095
------------------------------------------------------------------------------
Now can repeat this for various values of s running from 20 to 70, producing the table below.
Table of Predicted Probabilities of f for Various Values of s Holding cv1 at 40

|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
s f  |
20 0  |   .0003368   .0005779     0.58   0.560    -.0007958    .0014695
20 1  |   .2310582   .1500289     1.54   0.124    -.0629931    .5251095
25 0  |    .000808   .0012067     0.67   0.503    -.0015571     .003173
25 1  |   .2451555   .1320954     1.86   0.063    -.0137469    .5040578
30 0  |   .0019367   .0024706     0.78   0.433    -.0029056    .0067789
30 1  |   .2598222   .1136085     2.29   0.022     .0371536    .4824908
35 0  |   .0046348   .0049337     0.94   0.348     -.005035    .0143047
35 1  |   .2750467   .0959104     2.87   0.004     .0870657    .4630276
40 0  |   .0110505   .0095531     1.16   0.247    -.0076733    .0297743
40 1  |   .2908127    .081642     3.56   0.000     .1307973    .4508282
45 0  |   .0261139   .0178944     1.46   0.144    -.0089585    .0611863
45 1  |   .3070997   .0752299     4.08   0.000     .1596518    .4545475
50 0  |   .0604557   .0329478     1.83   0.067    -.0041208    .1250322
50 1  |   .3238822   .0808248     4.01   0.000     .1654685    .4822959
55 0  |   .1337569   .0622149     2.15   0.032     .0118178    .2556959
55 1  |   .3411303   .0980782     3.48   0.001     .1489005    .5333601
60 0  |   .2703596   .1168105     2.31   0.021     .0414151     .499304
60 1  |   .3588096   .1233704     2.91   0.004      .117008    .6006111
65 0  |   .4706697    .180248     2.61   0.009       .11739    .8239493
65 1  |   .3768809   .1535731     2.45   0.014     .0758831    .6778787
70 0  |   .6808947   .1951477     3.49   0.000     .2984123    1.063377
70 1  |   .3953013   .1867987     2.12   0.034     .0291827    .7614199
------------------------------------------------------------------------------
We will repeat this holding cv1 at 50 and then 60. We will then plot the probabilities for each of the three values of cv1.

Instead of looking at separate values for f0 and f1, we could compute the difference in probabilities. Here is an example using margins with the dydx option.

margins, dydx(f) at(s=20 cv1=40)

Conditional marginal effects                      Number of obs   =        200
Model VCE    : OIM

Expression   : Pr(y), predict()
dy/dx w.r.t. : 1.f
at           : s               =          20
cv1             =          40

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.f |   .2307214    .150045     1.54   0.124    -.0633615    .5248042
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Okay, let's repeat this for different values of s, producing the table below.
Table of Differences in Probability for Various Values of s Holding cv1 at 40

|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
s  |
20  |   .2307214    .150045     1.54   0.124    -.0633615    .5248042
25  |   .2443475   .1321009     1.85   0.064    -.0145655    .5032605
30  |   .2578855   .1135271     2.27   0.023     .0353765    .4803946
35  |   .2704118   .0954463     2.83   0.005     .0833405    .4574832
40  |   .2797622   .0798258     3.50   0.000     .1233066    .4362179
45  |   .2809858   .0696338     4.04   0.000     .1445061    .4174655
50  |   .2634265   .0682395     3.86   0.000     .1296795    .3971735
55  |   .2073734   .0822883     2.52   0.012     .0460913    .3686556
60  |     .08845   .1291224     0.69   0.493    -.1646253    .3415252
65  |  -.0937888   .2006804    -0.47   0.640    -.4871151    .2995376
70  |  -.2855934   .2436296    -1.17   0.241    -.7630986    .1919118
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Next, we need to repeat the process while holding cv1 at 50 and then 60. Then we can plot the differences in probabilities for the three values of cv1 on a single graph.

The Stata FAQ page, How can I understand a categorical by continuous interaction in logistic regression? shows an alternative method for graphing these difference in probability lines to include confidence intervals. Here are the graphs from that FAQ page.





#### Log odds metric -- continuous by continuous interaction

This time we have a dataset that has two continuous predictors (r & m) and a continuous covariate (cv1).
logit y c.r##c.m cv1, nolog

Logistic regression                               Number of obs   =        200
LR chi2(4)      =      66.80
Prob > chi2     =     0.0000
Log likelihood = -77.953857                       Pseudo R2       =     0.3000

------------------------------------------------------------------------------
y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
r |   .4342063   .1961642     2.21   0.027     .0497316    .8186809
m |   .5104617   .2011856     2.54   0.011     .1161452    .9047782
|
c.r#c.m |  -.0068144   .0033337    -2.04   0.041    -.0133483   -.0002805
|
cv1 |   .0309685   .0271748     1.14   0.254    -.0222931      .08423
_cons |  -34.09122   11.73402    -2.91   0.004    -57.08947   -11.09297
------------------------------------------------------------------------------
The trick to interpreting continuous by continuous interactions is to fix one predictor at a given value and to vary the other predictor. Once again, since the log odds model is a linear model it really doesn't matter what value the covariate is held at; the slopes do not change. For convenience we will just hold cv1 at zero.

Here is an example manual computation of the slope of r holding m at 30.

slope = b[r] + 30*b[r#m] = .43420626 + 30*(-.00681441) = .22977396
Here is the same computation using Stata.
margins, dydx(r) at(m=30) predict(xb)

Average marginal effects                          Number of obs   =        200
Model VCE    : OIM

Expression   : Linear prediction, predict(xb)
dy/dx w.r.t. : r
at           : m               =          30
cv1             =           0

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
r |   .2297741   .0982943     2.34   0.019     .0371207    .4224274
------------------------------------------------------------------------------
The table below shows the slope for r for various values of m running from 30 to 70. Since this is a linear model we do not have to hold cv1 at any particular value.
Table of Slopes for r for Various Values of m

|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
m  |
30  |   .2297741   .0982943     2.34   0.019     .0371207    .4224274
40  |     .16163   .0670895     2.41   0.016     .0301369    .2931231
50  |   .0934859   .0395342     2.36   0.018     .0160004    .1709715
60  |   .0253419   .0291137     0.87   0.384    -.0317199    .0824037
70  |  -.0428022   .0485281    -0.88   0.378    -.1379156    .0523112
------------------------------------------------------------------------------
We arbitrarily chose to vary m and look at the slope of r but we could have easily reversed the variables. Hopefully, your knowledge of the theory behind the model along with substantive knowledge will suggest which variable to manipulate.

Below is a graph of the slopes from the table above.

This time we are going to move directly to the probability interpretation by-passing the odds ratio metric.

#### Probability metric -- continuous by continuous interaction

We will rerun our model.
logit y c.r##c.m cv1, nolog

Logistic regression                               Number of obs   =        200
LR chi2(4)      =      66.80
Prob > chi2     =     0.0000
Log likelihood = -77.953857                       Pseudo R2       =     0.3000

------------------------------------------------------------------------------
y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
r |   .4342063   .1961642     2.21   0.027     .0497316    .8186809
m |   .5104617   .2011856     2.54   0.011     .1161452    .9047782
|
c.r#c.m |  -.0068144   .0033337    -2.04   0.041    -.0133483   -.0002805
|
cv1 |   .0309685   .0271748     1.14   0.254    -.0222931      .08423
_cons |  -34.09122   11.73402    -2.91   0.004    -57.08947   -11.09297
------------------------------------------------------------------------------
Next we will calculate the values of the covariate for the mean minus one standard deviation, the mean, and the mean plus one standard deviation.
summarize cv1

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
cv1 |       200      52.405    10.73579         26         71

mean cv1 - 1sd = 41.669207
mean cv1       = 52.405
mean cv1 + 1sd = 63.140793
Here is an example of a computation for the slope of r in the probability metric for m = 30 hold cv1 at its mean minus 1 sd (41.669207).
margins, dydx(r) at(m=30 cv1=41.669207)

Average marginal effects                          Number of obs   =        200
Model VCE    : OIM

Expression   : Pr(y), predict()
dy/dx w.r.t. : r
at           : m               =          30
cv1             =    41.66921

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
r |   .0061133   .0065712     0.93   0.352     -.006766    .0189926
------------------------------------------------------------------------------
We will now compute the slopes for r for differing values of m for each of the three values of cv1.
Table for Slope of r for Various Values of m holding cv1 at mean minus 1 sd

|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
m  |
30  |   .0061133   .0065712     0.93   0.352     -.006766    .0189926
35  |    .006587   .0061377     1.07   0.283    -.0054427    .0186167
40  |   .0071815   .0056839     1.26   0.206    -.0039586    .0183217
45  |   .0078851   .0052656     1.50   0.134    -.0024354    .0182055
50  |   .0085235    .004981     1.71   0.087    -.0012391    .0182861
55  |   .0083341   .0049614     1.68   0.093    -.0013901    .0180583
60  |   .0052692   .0059747     0.88   0.378    -.0064411    .0169795
65  |   -.002175   .0090427    -0.24   0.810    -.0198984    .0155484
70  |  -.0091967   .0089699    -1.03   0.305    -.0267774    .0083839
------------------------------------------------------------------------------

Table for Slope of r for Various Values of m holding cv1 at the mean
-------------+----------------------------------------------------------------
30 |   .0074917   .0069416     1.08   0.280    -.0061135    .0210969
35 |   .0081075   .0063953     1.27   0.205     -.004427    .0206421
40 |   .0088605   .0057648     1.54   0.124    -.0024384    .0201593
45 |    .009721   .0051157     1.90   0.057    -.0003056    .0197476
50 |   .0104242   .0046175     2.26   0.024     .0013739    .0194744
55 |     .00992   .0046688     2.12   0.034     .0007692    .0190708
60 |   .0058498    .006339     0.92   0.356    -.0065745    .0182741
65 |  -.0021432   .0088189    -0.24   0.808     -.019428    .0151416
70 |  -.0081533   .0075364    -1.08   0.279    -.0229243    .0066177
------------------------------------------------------------------------------

Table for Slope of r for Various Values of m holding cv1 at mean plus 1 sd
-------------+----------------------------------------------------------------
m  |
30 |   .0090189   .0073769     1.22   0.221    -.0054396    .0234774
35 |   .0097902   .0067546     1.45   0.147    -.0034485    .0230289
40 |   .0107094   .0060155     1.78   0.075    -.0010807    .0224994
45 |   .0117184   .0052384     2.24   0.025     .0014513    .0219854
50 |   .0124196   .0046088     2.69   0.007     .0033864    .0214527
55 |   .0114027    .004686     2.43   0.015     .0022182    .0205871
60 |    .006181   .0067253     0.92   0.358    -.0070003    .0193622
65 |  -.0020011   .0080879    -0.25   0.805    -.0178531    .0138509
70 |  -.0069432   .0060361    -1.15   0.250    -.0187739    .0048874
We will graph each of the three tables above.

#### The bottom line

• Just because the interaction term is significant in the log odds model, it doesn't mean that the probability difference in differences will be significant for values of the covariate of interest.
• Paradoxically, even if the interaction term is not significant in the log odds model, the probability difference in differences may be significant for some values of the covariate.
• In the probability metric the values of all the variables in the model matter.

#### References

Ai, C.R. and Norton E.C. 2003. Interaction terms in logit and probit models. Economics Letters 80(1): 123-129.

Greenland, S. and Rothman, K.J. 1998. Modern Epidemiology, 2nd Ed. Philadelphia: Lippincott Williams and Wilkins.

Mitchell, M.N. and Chen X. 2005. Visualizing main effects and interactions for binary logit model. Stata Journal 5(1): 64-82.

Norton, E.C., Wang, H., and Ai, C. 2004 Computing interaction effects and standard errors in logit and probit models. Stata Journal 4(2): 154-167.

#### Comma separated data files

Categorical by categorical: concon2.csv

Categorical by continuous: logitcatcon.csv

Continuous by continuous: logitconcon.csv

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.