### Stata FAQ How can I understand a categorical by continuous interaction? (Stata 11)

First off, let's start with what a significant categorical by continuous interaction means. It means that the slope of the continuous variable is different for one or more levels of the categorical variable.

We will use an example from the hsbdemo dataset that has a statistically significant categorical by continuous interaction to illustrate one possible explanatory approach.

The categorical variable is female, a zero/one variable with females coded as one. The continuous predictor variable, socst, is a standardized test score for social studies. We will begin by running the regression model and graphing the interaction. Please note that we use c.socst to indicate that socst is a continuous variable.

use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

regress write female##c.socst

Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   49.26
Model |  7685.43528     3  2561.81176           Prob > F      =  0.0000
Residual |  10193.4397   196  52.0073455           R-squared     =  0.4299
Total |   17878.875   199   89.843593           Root MSE      =  7.2116

------------------------------------------------------------------------------
write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.female |   15.00001    5.09795     2.94   0.004     4.946132    25.05389
socst |   .6247968   .0670709     9.32   0.000     .4925236    .7570701
|
female#|
c.socst |
1  |  -.2047288   .0953726    -2.15   0.033    -.3928171   -.0166405
|
_cons |    17.7619   3.554993     5.00   0.000     10.75095    24.77284
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off)


Looking at the graph, we can see that the two regression lines are not parallel and that the line for females falls above the line for males. How could we tell that females are higher than males? The coefficient for female is positive (15.00) which tells us that the level for females is higher than for males.

Let's interpret the coefficients for this model starting with the constant (17.76). This is the value of the intercept for socst regressed on write for males. i.e., the expected value for write when both socst and female equal zero.

The coefficient for socst is .6247 which is the slope of the regression line for the male group. The value for the female by socst interaction is -.2047 which is the difference in slope between the male and female group, i.e., the slope for the female group would be about .6248 - .2047 = .4201.

Lastly, the coefficient for female is 15.00 which is the difference in the intercepts between males and females when socst has a value of zero. The coefficient is significant when socst equals zero. This is not a very interesting fact because socst never actually equals zero. The difference between males and females may or may not be significantly different for different values of socst. What we will do is look at the male-female difference at three different values of socst; one standard deviation below the mean, at the mean, and one standard deviation above the mean.

summarize socst

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
socst |       200      52.405    10.73579         26         71

global mean = r(mean)
global meanm1 = r(mean)-r(sd)
global meanp1 = r(mean)+r(sd)

display $meanm1 " "$mean "   " $meanp1 41.669207 52.405 63.140793 So, the three values we will use to hold socst constant are 41.669207, 52.405, and 63.140793. Next, we will use the margins command to hold socst constant at the three values defined above. The post option is included to allow us to test differences between males and females at each of the different values. margins female, at(socst=(41.669207 52.405 63.140793)) post vsquish Adjusted predictions Number of obs = 200 Model VCE : OLS Expression : Linear prediction, predict() 1._at : socst = 41.66921 2._at : socst = 52.405 3._at : socst = 63.14079 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#female | 1 0 | 43.79668 1.016072 43.10 0.000 41.80522 45.78815 1 1 | 50.26581 1.028985 48.85 0.000 48.24903 52.28258 2 0 | 50.50437 .7571024 66.71 0.000 49.02048 51.98827 2 1 | 54.77557 .6916204 79.20 0.000 53.42002 56.13112 3 0 | 57.21206 1.072835 53.33 0.000 55.10934 59.31478 3 1 | 59.28533 .9785919 60.58 0.000 57.36733 61.20334 ------------------------------------------------------------------------------ Now that we have the conditional means for males and female at each of the three values of socst we can begin by testing the different when socst is held at one standard deviation below the mean using the lincom command. lincom _b[1._at#1.female] - _b[1._at#0.female] ( 1) - 1bn._at#0bn.female + 1bn._at#1.female = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 6.469122 1.446103 4.47 0.000 3.634812 9.303432 ------------------------------------------------------------------------------ twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono) /// xline($meanm1) xtitle(socst with verticle line at mean minus 1sd)


The lincom command indicates that the difference in conditional means at one standard deviation below the mean of socst is about 6.5. And, this difference is statistically significant. The graph above shows the two regression lines with an added vertical line at 1sd below the mean of socst.

Next we will repeat this process for the other two values of socst.

lincom _b[2._at#1.female] - _b[2._at#0.female]

( 1)  - 2._at#0bn.female + 2._at#1.female = 0

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) |   4.271196   1.025448     4.17   0.000     2.261356    6.281037
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono) ///
xline($mean) xtitle(socst with verticle line at mean) lincom _b[3._at#1.female] - _b[3._at#0.female] ( 1) - 3._at#0bn.female + 3._at#1.female = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 2.07327 1.452108 1.43 0.153 -.7728094 4.919349 ------------------------------------------------------------------------------ twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono) /// xline($meanp1) xtitle(socst with verticle line at mean plus 1sd)


In summary, we can explain this categorical by continuous interaction as having a significant difference in the slope of socst for males and females. Further, there is a significant difference in levels between males and females at both the mean and one standard deviation below the mean of socst. The gender difference in levels at the point one standard deviation above the mean of socst is not statistically significant.

If looking at male/female differences at three levels of socst is good wouldn't including more levels of socst be even better? It turns out that there is an easy way to compute the male/female differences at even more levels of socst from 30 to 70 by increments of two. First we will quietly rerun the regess command and follow it with a variation on the margins command.

quietly regress write female##c.socst

margins, dydx(female) at(socst=(30(2)70)) vsquish noatlegend post

Conditional marginal effects                      Number of obs   =        200
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 1.female

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.female     |
_at |
1  |   8.858145   2.366305     3.74   0.000     4.220273    13.49602
2  |   8.448687   2.195956     3.85   0.000     4.144692    12.75268
3  |   8.039229   2.029241     3.96   0.000      4.06199    12.01647
4  |   7.629772   1.867132     4.09   0.000     3.970261    11.28928
5  |   7.220314   1.710938     4.22   0.000     3.866936    10.57369
6  |   6.810857   1.562436     4.36   0.000     3.748538    9.873176
7  |   6.401399   1.424034     4.50   0.000     3.610344    9.192454
8  |   5.991941   1.298963     4.61   0.000     3.446021    8.537861
9  |   5.582484   1.191429     4.69   0.000     3.247326    7.917642
10  |   5.173026   1.106558     4.67   0.000     3.004213     7.34184
11  |   4.763569   1.049859     4.54   0.000     2.705882    6.821255
12  |   4.354111   1.026015     4.24   0.000     2.343159    6.365063
13  |   3.944653   1.037293     3.80   0.000     1.911597     5.97771
14  |   3.535196   1.082595     3.27   0.001     1.413348    5.657044
15  |   3.125738   1.157937     2.70   0.007      .856224    5.395252
16  |   2.716281   1.257931     2.16   0.031      .250782    5.181779
17  |   2.306823   1.377218     1.67   0.094    -.3924741     5.00612
18  |   1.897365   1.511236     1.26   0.209    -1.064604    4.859334
19  |   1.487908   1.656415     0.90   0.369    -1.758606    4.734421
20  |    1.07845    1.81007     0.60   0.551    -2.469221    4.626121
21  |   .6689926   1.970219     0.34   0.734    -3.192565     4.53055
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

matrix at=e(at)

matrix at=at[1...,"socst"]

matrix list at

at[21,1]
socst
r1     30
r2     32
r3     34
r4     36
r5     38
r6     40
r7     42
r8     44
r9     46
r10     48
r11     50
r12     52
r13     54
r14     56
r15     58
r16     60
r17     62
r18     64
r19     66
r20     68
r21     70
By including dydx(female) we will get the differences between males and females at each of the 21 levels of socst. For graphing purposes we will save the levels of socst in a matrix called at.

Next, we will use a command written by Roger Newson called parmest (findit parmest) which will place the difference values along with the confidence intervals in memory replacing our hsbdemo dataset. We can the use these values along with the at values to create a graph of the male/female differences.

parmest, fast

drop if z==.

svmat at

twoway (line estimate at1)(line min95 at1)(line max95 at1), ///
legend(off) yline(0) ytitle(male/female difference) xtitle(socst) scheme(lean1)


The graph shows that the male/female differences decreases as the value of socst increases. Whenever the 95% confidence interval for the difference does not include zero, the difference can be considered to be statistically significant. This looks to be the case for all values of socst up to about 60. For socst values greater than 60 the males/female difference is not significant.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.