|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
We will use an example from the hsb2 dataset that has a statistically significant categorical by continuous interaction to illustrate one possible explanatory approach.
The categorical variable is female, a zero/one variable with females coded as one. The continuous predictor variable, socst, is a standardized test score for social studies.
use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear
generate femXsoc=female*socst
regress write female socst femXsoc
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 49.26
Model | 7685.43528 3 2561.81176 Prob > F = 0.0000
Residual | 10193.4397 196 52.0073455 R-squared = 0.4299
-------------+------------------------------ Adj R-squared = 0.4211
Total | 17878.875 199 89.843593 Root MSE = 7.2116
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 15.00001 5.09795 2.94 0.004 4.946132 25.05389
socst | .6247968 .0670709 9.32 0.000 .4925236 .7570701
femXsoc | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405
_cons | 17.7619 3.554993 5.00 0.000 10.75095 24.77284
------------------------------------------------------------------------------
twoway (lfit write socst if ~female)(lfit write socst if female), legend(off)

Looking at the graph, we can see that the two regression lines are not
parallel and that the line for females falls above the line for males. How could we tell
that females are higher than males? The coefficient for female is positive (15.00)
which tells us that the level for females is higher than for males.Let's interpret the coefficients for this model starting with the constant (17.76). This is the value of the intercept for socst regressed on write for males. i.e., the expected value for write when both socst and female equal zero.
The coefficient for socst is .6247 which is the slope of the regression line for the male group. The value for the female by socst interaction is -.2047 which is the difference in slope between the male and female group, i.e., the slope for the female group would be about .6248 - .2047 = .4201.
Lastly, the coefficient for female is 15.00 which is the difference in the intercepts between males and females when socst has a value of zero. The coefficient is significant when socst equals zero. However, the difference between males and females may not be significant for all values of socst. The problem is that knowing there is a difference of 15 when socst is zero is not very useful. One reason is that socst can't actually have on a value of zero. What we need is to find a more useful value to look at the male/female difference. We will start off by shifting the zero point to the mean of socst. We do this by subtracting the mean value of socst from each observation and then rerunning the regression.
summarize socst
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
socst | 200 52.405 10.73579 26 71
/* save mean and sd of socst as global macro variables */
global mean = r(mean)
global sd = r(sd)
/* create new variable centered at the mean of socst */
generate mean = socst-$mean
generate femXmean=female*mean
regress write female mean femXmean
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 49.26
Model | 7685.43527 3 2561.81176 Prob > F = 0.0000
Residual | 10193.4397 196 52.0073456 R-squared = 0.4299
-------------+------------------------------ Adj R-squared = 0.4211
Total | 17878.875 199 89.843593 Root MSE = 7.2116
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 4.271196 1.025448 4.17 0.000 2.248868 6.293523
mean | .6247968 .0670709 9.32 0.000 .4925236 .7570701
femXmean | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405
_cons | 50.50437 .7571024 66.71 0.000 49.01126 51.99749
------------------------------------------------------------------------------
twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono)
xline($mean) xtitle(socst with verticle line at mean)

As you can see the coefficients for slope and interaction remain the same but the constant
and the coefficient for female are different. The value for the constant is 50.50 which is
the value for males when
socst is at its mean and the difference between males and females at this point
is 4.27, which is statistically significant.The graph shows the two regression lines with an added vertical line at the mean of socst.
Next we will center the value of socst at one standard deviation above the mean and again rerun the regression.
/* create new variable centered at 1 sd above the mean of socst */
generate plus1sd = socst-($mean + $sd)
global plus1sd = $mean + $sd
generate femXplus1=female*plus1sd
regress write female plus1sd femXplus1
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 49.26
Model | 7685.43531 3 2561.81177 Prob > F = 0.0000
Residual | 10193.4397 196 52.0073454 R-squared = 0.4299
-------------+------------------------------ Adj R-squared = 0.4211
Total | 17878.875 199 89.843593 Root MSE = 7.2116
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 2.07327 1.452108 1.43 0.155 -.7904922 4.937032
plus1sd | .6247968 .0670709 9.32 0.000 .4925236 .7570701
femXplus1 | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405
_cons | 57.21206 1.072835 53.33 0.000 55.09628 59.32785
------------------------------------------------------------------------------
twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono)
xline($plus1sd) xtitle(socst with verticle line at mean plus 1 sd)

Once again the coefficients for slope and interaction remain the same but the constant
and female coefficient are different. The value for the constant is now 57.21 which is
the value for males when
socst is at one standard deviation above its mean and the difference between males
and females at this point
is 2.07, which is not statistically significant.The graph shows the two regression lines with an added vertical line at one standard deviation above the mean of socst.
Lastly, we will repeate this proces one final time by centering the value of socst at one standard deviation below the mean and reruning the regression.
/* create new variable centered at 1 sd below the mean of socst */
generate minus1sd = socst-($mean -$sd)
global minus1sd = $mean -$sd
generate femXminus1=female*minus1sd
regress write female minus1sd femXminus1
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 49.26
Model | 7685.43528 3 2561.81176 Prob > F = 0.0000
Residual | 10193.4397 196 52.0073455 R-squared = 0.4299
-------------+------------------------------ Adj R-squared = 0.4211
Total | 17878.875 199 89.843593 Root MSE = 7.2116
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 6.469122 1.446103 4.47 0.000 3.617203 9.321041
minus1sd | .6247968 .0670709 9.32 0.000 .4925236 .7570701
femXminus1 | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405
_cons | 43.79668 1.016072 43.10 0.000 41.79285 45.80052
------------------------------------------------------------------------------
twoway (lfit write socst if ~female)(lfit write socst if female), legend(off) scheme(s2mono)
xline($minus1sd) xtitle(socst with verticle line at mean minus 1 sd)

Following the same pattern as the two previous regressions only the constant and the
coefficient for female are
different. The value for the constant is now 43.8 which is the value
for males when socst is at one standard deviation below its mean and the
difference between males
and females at this point is 6.47, which is statistically significant.The graph shows the two regression lines with an added vertical line at one standard deviation below the mean of socst.
In summary, we can explain this categorical by continuous interaction as having a significant difference in the slope of socst for males and females. Further, there is a significant difference in levels between males and females at both the mean and one standard deviation below the mean of socst. The gender difference in levels at the point one standard deviation above the mean of socst is not statistically significant.
In the examples above, we tested the differences in levels for males and females by running separate regressions after centering socst at different values. It is possible to obtain the same results with a single regression using the lincom command. Here is how it would work using the global macro variables for mean and sd that we defined above.
regress write female socst femXsoc
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 49.26
Model | 7685.43528 3 2561.81176 Prob > F = 0.0000
Residual | 10193.4397 196 52.0073455 R-squared = 0.4299
-------------+------------------------------ Adj R-squared = 0.4211
Total | 17878.875 199 89.843593 Root MSE = 7.2116
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 15.00001 5.09795 2.94 0.004 4.946132 25.05389
socst | .6247968 .0670709 9.32 0.000 .4925236 .7570701
femXsoc | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405
_cons | 17.7619 3.554993 5.00 0.000 10.75095 24.77284
------------------------------------------------------------------------------
/* socst set at the mean + 1 sd */
lincom female + ($mean + $sd)*femXsoc
( 1) female + 63.14079 femXsoc = 0
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 2.07327 1.452108 1.43 0.155 -.7904922 4.937032
------------------------------------------------------------------------------
/* socst set at the mean */
lincom female + ($mean)*femXsoc
( 1) female + 52.405 femXsoc = 0
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 4.271196 1.025448 4.17 0.000 2.248868 6.293523
------------------------------------------------------------------------------
/* socst set at the mean - 1 sd */
lincom female + ($mean - $sd)*femXsoc
( 1) female + 41.66921 femXsoc = 0
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 6.469122 1.446103 4.47 0.000 3.617203 9.321041
------------------------------------------------------------------------------
The coefficients obtained using
lincom are the same as those from the three
regressions using centering.Also See
You might want to look at other ATS Stat webpages that cover categorical by continuous interaction in greater depth.
Regression with Stata Chapter 7: More on interactions of categorical and continuous variables
Stata Library: How do I handle interactions of continuous and categorical variables?
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services