Stata FAQ
How can I understand a categorical by continuous interaction? (Stata 10 and earlier)

First off, let's start with what a significant categorical by continuous interaction means. It means that the slope of the continuous variable is different for one or more levels of the categorical variable.

We will use an example from the hsb2 dataset that has a statistically significant categorical by continuous interaction to illustrate one possible explanatory approach.

The categorical variable is female, a zero/one variable with females coded as one. The continuous predictor variable, socst, is a standardized test score for social studies.

use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear

generate femXsoc=female*socst

regress write female socst femXsoc

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   49.26
       Model |  7685.43528     3  2561.81176           Prob > F      =  0.0000
    Residual |  10193.4397   196  52.0073455           R-squared     =  0.4299
-------------+------------------------------           Adj R-squared =  0.4211
       Total |   17878.875   199   89.843593           Root MSE      =  7.2116

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   15.00001    5.09795     2.94   0.004     4.946132    25.05389
       socst |   .6247968   .0670709     9.32   0.000     .4925236    .7570701
     femXsoc |  -.2047288   .0953726    -2.15   0.033    -.3928171   -.0166405
       _cons |    17.7619   3.554993     5.00   0.000     10.75095    24.77284
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off)

Looking at the graph, we can see that the two regression lines are not parallel and that the line for females falls above the line for males. How could we tell that females are higher than males? The coefficient for female is positive (15.00) which tells us that the level for females is higher than for males.

Let's interpret the coefficients for this model starting with the constant (17.76). This is the value of the intercept for socst regressed on write for males. i.e., the expected value for write when both socst and female equal zero.

The coefficient for socst is .6247 which is the slope of the regression line for the male group. The value for the female by socst interaction is -.2047 which is the difference in slope between the male and female group, i.e., the slope for the female group would be about .6248 - .2047 = .4201.

Lastly, the coefficient for female is 15.00 which is the difference in the intercepts between males and females when socst has a value of zero. The coefficient is significant when socst equals zero. However, the difference between males and females may not be significant for all values of socst. The problem is that knowing there is a difference of 15 when socst is zero is not very useful. One reason is that socst can't actually have on a value of zero. What we need is to find a more useful value to look at the male/female difference. We will start off by shifting the zero point to the mean of socst. We do this by subtracting the mean value of socst from each observation and then rerunning the regression.

summarize socst

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       socst |       200      52.405    10.73579         26         71

/* save mean and sd of socst as global macro variables */
global mean = r(mean)
global sd = r(sd)

/* create new variable centered at the mean of socst */
generate mean = socst-$mean

generate femXmean=female*mean

regress write female mean femXmean

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   49.26
       Model |  7685.43527     3  2561.81176           Prob > F      =  0.0000
    Residual |  10193.4397   196  52.0073456           R-squared     =  0.4299
-------------+------------------------------           Adj R-squared =  0.4211
       Total |   17878.875   199   89.843593           Root MSE      =  7.2116

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   4.271196   1.025448     4.17   0.000     2.248868    6.293523
        mean |   .6247968   .0670709     9.32   0.000     .4925236    .7570701
    femXmean |  -.2047288   .0953726    -2.15   0.033    -.3928171   -.0166405
       _cons |   50.50437   .7571024    66.71   0.000     49.01126    51.99749
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono)
       xline($mean) xtitle(socst with verticle line at mean)

As you can see the coefficients for slope and interaction remain the same but the constant and the coefficient for female are different. The value for the constant is 50.50 which is the value for males when socst is at its mean and the difference between males and females at this point is 4.27, which is statistically significant.

The graph shows the two regression lines with an added vertical line at the mean of socst.

Next we will center the value of socst at one standard deviation above the mean and again rerun the regression.

/* create new variable centered at 1 sd above the mean of socst */
generate plus1sd = socst-($mean + $sd)
global plus1sd = $mean + $sd

generate femXplus1=female*plus1sd

regress write female plus1sd femXplus1

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   49.26
       Model |  7685.43531     3  2561.81177           Prob > F      =  0.0000
    Residual |  10193.4397   196  52.0073454           R-squared     =  0.4299
-------------+------------------------------           Adj R-squared =  0.4211
       Total |   17878.875   199   89.843593           Root MSE      =  7.2116

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |    2.07327   1.452108     1.43   0.155    -.7904922    4.937032
     plus1sd |   .6247968   .0670709     9.32   0.000     .4925236    .7570701
   femXplus1 |  -.2047288   .0953726    -2.15   0.033    -.3928171   -.0166405
       _cons |   57.21206   1.072835    53.33   0.000     55.09628    59.32785
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono)
       xline($plus1sd) xtitle(socst with verticle line at mean plus 1 sd)
   
Once again the coefficients for slope and interaction remain the same but the constant and female coefficient are different. The value for the constant is now 57.21 which is the value for males when socst is at one standard deviation above its mean and the difference between males and females at this point is 2.07, which is not statistically significant.

The graph shows the two regression lines with an added vertical line at one standard deviation above the mean of socst.

Lastly, we will repeate this proces one final time by centering the value of socst at one standard deviation below the mean and reruning the regression.

/* create new variable centered at 1 sd below the mean of socst */
generate minus1sd = socst-($mean -$sd)
global minus1sd = $mean -$sd

generate femXminus1=female*minus1sd

regress write female minus1sd femXminus1

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   49.26
       Model |  7685.43528     3  2561.81176           Prob > F      =  0.0000
    Residual |  10193.4397   196  52.0073455           R-squared     =  0.4299
-------------+------------------------------           Adj R-squared =  0.4211
       Total |   17878.875   199   89.843593           Root MSE      =  7.2116

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   6.469122   1.446103     4.47   0.000     3.617203    9.321041
    minus1sd |   .6247968   .0670709     9.32   0.000     .4925236    .7570701
  femXminus1 |  -.2047288   .0953726    -2.15   0.033    -.3928171   -.0166405
       _cons |   43.79668   1.016072    43.10   0.000     41.79285    45.80052
------------------------------------------------------------------------------

twoway (lfit write socst if ~female)(lfit write socst if female), legend(off)  scheme(s2mono)
       xline($minus1sd) xtitle(socst with verticle line at mean minus 1 sd)

Following the same pattern as the two previous regressions only the constant and the coefficient for female are different. The value for the constant is now 43.8 which is the value for males when socst is at one standard deviation below its mean and the difference between males and females at this point is 6.47, which is statistically significant.

The graph shows the two regression lines with an added vertical line at one standard deviation below the mean of socst.

In summary, we can explain this categorical by continuous interaction as having a significant difference in the slope of socst for males and females. Further, there is a significant difference in levels between males and females at both the mean and one standard deviation below the mean of socst. The gender difference in levels at the point one standard deviation above the mean of socst is not statistically significant.

In the examples above, we tested the differences in levels for males and females by running separate regressions after centering socst at different values. It is possible to obtain the same results with a single regression using the lincom command. Here is how it would work using the global macro variables for mean and sd that we defined above.

regress write female socst femXsoc

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   49.26
       Model |  7685.43528     3  2561.81176           Prob > F      =  0.0000
    Residual |  10193.4397   196  52.0073455           R-squared     =  0.4299
-------------+------------------------------           Adj R-squared =  0.4211
       Total |   17878.875   199   89.843593           Root MSE      =  7.2116

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   15.00001    5.09795     2.94   0.004     4.946132    25.05389
       socst |   .6247968   .0670709     9.32   0.000     .4925236    .7570701
     femXsoc |  -.2047288   .0953726    -2.15   0.033    -.3928171   -.0166405
       _cons |    17.7619   3.554993     5.00   0.000     10.75095    24.77284
------------------------------------------------------------------------------

/* socst set at the mean + 1 sd */

lincom female + ($mean + $sd)*femXsoc

 ( 1)  female + 63.14079 femXsoc = 0

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    2.07327   1.452108     1.43   0.155    -.7904922    4.937032
------------------------------------------------------------------------------

/* socst set at the mean */

lincom female + ($mean)*femXsoc

 ( 1)  female + 52.405 femXsoc = 0

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   4.271196   1.025448     4.17   0.000     2.248868    6.293523
------------------------------------------------------------------------------

/* socst set at the mean - 1 sd */

lincom female + ($mean - $sd)*femXsoc

 ( 1)  female + 41.66921 femXsoc = 0

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   6.469122   1.446103     4.47   0.000     3.617203    9.321041
------------------------------------------------------------------------------
The coefficients obtained using lincom are the same as those from the three regressions using centering.

Also See

You might want to look at other ATS Stat webpages that cover categorical by continuous interaction in greater depth.

Regression with Stata Chapter 7: More on interactions of categorical and continuous variables

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.