Stata FAQ
How can I understand a categorical by continuous interaction? (Stata 11)

First off, let's start with what a significant categorical by continuous interaction means. It means that the slope of the continuous variable is different for one or more levels of the categorical variable.

We will use an example from the hsbdemo dataset that has a statistically significant categorical by continuous interaction to illustrate one possible explanatory approach.

The categorical variable is female, a zero/one variable with females coded as one. The continuous predictor variable, socst, is a standardized test score for social studies. We will begin by running the regression model and graphing the interaction. Please note that we use c.socst to indicate that socst is a continuous variable.

use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

regress write female##c.socst


      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   49.26
       Model |  7685.43528     3  2561.81176           Prob > F      =  0.0000
    Residual |  10193.4397   196  52.0073455           R-squared     =  0.4299
-------------+------------------------------           Adj R-squared =  0.4211
       Total |   17878.875   199   89.843593           Root MSE      =  7.2116

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    1.female |   15.00001    5.09795     2.94   0.004     4.946132    25.05389
       socst |   .6247968   .0670709     9.32   0.000     .4925236    .7570701
             |
      female#|
     c.socst |
          1  |  -.2047288   .0953726    -2.15   0.033    -.3928171   -.0166405
             |
       _cons |    17.7619   3.554993     5.00   0.000     10.75095    24.77284
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off)

Looking at the graph, we can see that the two regression lines are not parallel and that the line for females falls above the line for males. How could we tell that females are higher than males? The coefficient for female is positive (15.00) which tells us that the level for females is higher than for males.

Let's interpret the coefficients for this model starting with the constant (17.76). This is the value of the intercept for socst regressed on write for males. i.e., the expected value for write when both socst and female equal zero.

The coefficient for socst is .6247 which is the slope of the regression line for the male group. The value for the female by socst interaction is -.2047 which is the difference in slope between the male and female group, i.e., the slope for the female group would be about .6248 - .2047 = .4201.

Lastly, the coefficient for female is 15.00 which is the difference in the intercepts between males and females when socst has a value of zero. The coefficient is significant when socst equals zero. This is not a very interesting fact because socst never actually equals zero. The difference between males and females may or may not be significantly different for different values of socst. What we will do is look at the male-female difference at three different values of socst; one standard deviation below the mean, at the mean, and one standard deviation above the mean.

summarize socst

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       socst |       200      52.405    10.73579         26         71

global mean = r(mean)
global meanm1 = r(mean)-r(sd)
global meanp1 = r(mean)+r(sd)

display $meanm1 "   " $mean "   " $meanp1

41.669207   52.405   63.140793
So, the three values we will use to hold socst constant are 41.669207, 52.405, and 63.140793.

Next, we will use the margins command to hold socst constant at the three values defined above. The post option is included to allow us to test differences between males and females at each of the different values.

margins female, at(socst=(41.669207 52.405 63.140793)) post vsquish

Adjusted predictions                              Number of obs   =        200
Model VCE    : OLS

Expression   : Linear prediction, predict()
1._at        : socst           =    41.66921
2._at        : socst           =      52.405
3._at        : socst           =    63.14079

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  _at#female |
        1 0  |   43.79668   1.016072    43.10   0.000     41.80522    45.78815
        1 1  |   50.26581   1.028985    48.85   0.000     48.24903    52.28258
        2 0  |   50.50437   .7571024    66.71   0.000     49.02048    51.98827
        2 1  |   54.77557   .6916204    79.20   0.000     53.42002    56.13112
        3 0  |   57.21206   1.072835    53.33   0.000     55.10934    59.31478
        3 1  |   59.28533   .9785919    60.58   0.000     57.36733    61.20334
------------------------------------------------------------------------------
Now that we have the conditional means for males and female at each of the three values of socst we can begin by testing the different when socst is held at one standard deviation below the mean using the lincom command.
lincom _b[1._at#1.female] - _b[1._at#0.female]

 ( 1)  - 1bn._at#0bn.female + 1bn._at#1.female = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   6.469122   1.446103     4.47   0.000     3.634812    9.303432
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono) ///
          xline($meanm1) xtitle(socst with verticle line at mean minus 1sd)

The lincom command indicates that the difference in conditional means at one standard deviation below the mean of socst is about 6.5. And, this difference is statistically significant. The graph above shows the two regression lines with an added vertical line at 1sd below the mean of socst.

Next we will repeat this process for the other two values of socst.

lincom _b[2._at#1.female] - _b[2._at#0.female]

 ( 1)  - 2._at#0bn.female + 2._at#1.female = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   4.271196   1.025448     4.17   0.000     2.261356    6.281037
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono) ///
          xline($mean) xtitle(socst with verticle line at mean)
   


lincom _b[3._at#1.female] - _b[3._at#0.female]

 ( 1)  - 3._at#0bn.female + 3._at#1.female = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    2.07327   1.452108     1.43   0.153    -.7728094    4.919349
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono) ///
          xline($meanp1) xtitle(socst with verticle line at mean plus 1sd)

In summary, we can explain this categorical by continuous interaction as having a significant difference in the slope of socst for males and females. Further, there is a significant difference in levels between males and females at both the mean and one standard deviation below the mean of socst. The gender difference in levels at the point one standard deviation above the mean of socst is not statistically significant.

If looking at male/female differences at three levels of socst is good wouldn't including more levels of socst be even better? It turns out that there is an easy way to compute the male/female differences at even more levels of socst from 30 to 70 by increments of two. First we will quietly rerun the regess command and follow it with a variation on the margins command.

quietly regress write female##c.socst

margins, dydx(female) at(socst=(30(2)70)) vsquish noatlegend post

Conditional marginal effects                      Number of obs   =        200
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 1.female

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.female     |
         _at |
          1  |   8.858145   2.366305     3.74   0.000     4.220273    13.49602
          2  |   8.448687   2.195956     3.85   0.000     4.144692    12.75268
          3  |   8.039229   2.029241     3.96   0.000      4.06199    12.01647
          4  |   7.629772   1.867132     4.09   0.000     3.970261    11.28928
          5  |   7.220314   1.710938     4.22   0.000     3.866936    10.57369
          6  |   6.810857   1.562436     4.36   0.000     3.748538    9.873176
          7  |   6.401399   1.424034     4.50   0.000     3.610344    9.192454
          8  |   5.991941   1.298963     4.61   0.000     3.446021    8.537861
          9  |   5.582484   1.191429     4.69   0.000     3.247326    7.917642
         10  |   5.173026   1.106558     4.67   0.000     3.004213     7.34184
         11  |   4.763569   1.049859     4.54   0.000     2.705882    6.821255
         12  |   4.354111   1.026015     4.24   0.000     2.343159    6.365063
         13  |   3.944653   1.037293     3.80   0.000     1.911597     5.97771
         14  |   3.535196   1.082595     3.27   0.001     1.413348    5.657044
         15  |   3.125738   1.157937     2.70   0.007      .856224    5.395252
         16  |   2.716281   1.257931     2.16   0.031      .250782    5.181779
         17  |   2.306823   1.377218     1.67   0.094    -.3924741     5.00612
         18  |   1.897365   1.511236     1.26   0.209    -1.064604    4.859334
         19  |   1.487908   1.656415     0.90   0.369    -1.758606    4.734421
         20  |    1.07845    1.81007     0.60   0.551    -2.469221    4.626121
         21  |   .6689926   1.970219     0.34   0.734    -3.192565     4.53055
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

matrix at=e(at)

matrix at=at[1...,"socst"]

matrix list at

at[21,1]
     socst
 r1     30
 r2     32
 r3     34
 r4     36
 r5     38
 r6     40
 r7     42
 r8     44
 r9     46
r10     48
r11     50
r12     52
r13     54
r14     56
r15     58
r16     60
r17     62
r18     64
r19     66
r20     68
r21     70
By including dydx(female) we will get the differences between males and females at each of the 21 levels of socst. For graphing purposes we will save the levels of socst in a matrix called at.

Next, we will use a command written by Roger Newson called parmest (findit parmest) which will place the difference values along with the confidence intervals in memory replacing our hsbdemo dataset. We can the use these values along with the at values to create a graph of the male/female differences.

parmest, fast

drop if z==.

svmat at

twoway (line estimate at1)(line min95 at1)(line max95 at1), ///
  legend(off) yline(0) ytitle(male/female difference) xtitle(socst) scheme(lean1)
  
The graph shows that the male/female differences decreases as the value of socst increases. Whenever the 95% confidence interval for the difference does not include zero, the difference can be considered to be statistically significant. This looks to be the case for all values of socst up to about 60. For socst values greater than 60 the males/female difference is not significant.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.