|
|
|
||||
|
|
|||||
We will use an example from the dataset hsb2 that has a statistically significant continuous by continuous interaction to illustrate one explanatory approach. The continuous variables math and socst, are standardized test scores for math and social studies, respectively. We will make use of the ATS written command xi3, that allows us to easily include continuous by continuous interactions in the regress command. To obtain xi3 just type the command findit xi3 (see How can I used the findit command to search for programs?).
use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear
xi3: regress read math*socst
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 78.61
Model | 11424.7622 3 3808.25406 Prob > F = 0.0000
Residual | 9494.65783 196 48.4421318 R-squared = 0.5461
-------------+------------------------------ Adj R-squared = 0.5392
Total | 20919.42 199 105.122714 Root MSE = 6.96
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math | -.1105123 .2916338 -0.38 0.705 -.6856552 .4646307
socst | -.2200442 .2717539 -0.81 0.419 -.7559812 .3158928
_ImaXso | .0112807 .0052294 2.16 0.032 .0009677 .0215938
_cons | 37.84271 14.54521 2.60 0.010 9.157506 66.52792
------------------------------------------------------------------------------
Looking at the model above we see a significant math by socst interaction.
At this point all we can say is that for every unit increase in socst the slope
of math will increase by .0113 units. We could, of course, discuss the change
in slope of socst for each unit change in math but for this example we
will stick to the first interpretation.We know that the slope of math will increase as socst increases but it can be difficult to visualize this. It may helpful to think of socst as being an ordered categorical variable (low, medium, high) and then asking: how does the slope of math change as socst values move from low to medium to high? For our purposes low will be one standard deviation below the mean, medium will be at the mean, and high will be one standard deviation above the mean.
Additionally, the slopes for both math and socst do not appear to be significant but this is an artifact due to the high collinearity of math and socst with the interaction. You will see that when we center the variables the slopes for math and socst at their mean values the picture will become clearer.
In this next section we will be by centering math and socst at their means, i.e., setting socst to its "medium" value.
summarize socst
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
socst | 200 52.405 10.73579 26 71
/* save the mean and sd of socst as global macro variables */
global mean = r(mean)
global sd = r(sd)
/* create centered socst variable, soc1 */
generate soc1 = socst-$mean
/* create centered math variable, cmean */
summarize math
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
math | 200 52.645 9.368448 33 75
global mean2 = r(mean)
generate cmath = math-$mean2
summarize cmath
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cmath | 200 -1.97e-07 9.368448 -19.645 22.355
The new variables cmath and soc1 are the variables centered at
the mean of math and socst, respectively. The descriptive stats on cmath shows
that it has a range of roughtly -20 to +23 which will be useful to know when we get around
to graphing.Next, we will run a regression using the centered predictor variables.
xi3: regress read cmath*soc1
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 78.61
Model | 11424.7622 3 3808.25407 Prob > F = 0.0000
Residual | 9494.6578 196 48.4421316 R-squared = 0.5461
-------------+------------------------------ Adj R-squared = 0.5392
Total | 20919.42 199 105.122714 Root MSE = 6.96
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cmath | .480654 .0637009 7.55 0.000 .3550268 .6062812
soc1 | .3738294 .0555457 6.73 0.000 .2642855 .4833733
_IcmXso | .0112807 .0052294 2.16 0.032 .0009677 .0215938
_cons | 51.61533 .5686851 90.76 0.000 50.4938 52.73685
------------------------------------------------------------------------------
/* save slope and intercept as global macro variables */
global b1 = _b[cmath]
global c1 = _b[_cons]
Let's interpret the coefficients for this model. The constant (51.62) is the value of
read when cmath and soc1 are zero, i.e., when math and socst
are at their mean values. The slope for math (cmath) on read is .48
(and is significant) when both soc1 and _IcmXso (the interaction) are zero.
Further, for every on unit of increase in soc1 the slope of cmath increases
by .0113 units.We can graph the regression of cmath on read because we are holding socst constant at its mean value.
twoway (function $c1 + $b1*x, range(-20 23)) ///
(scatter read cmath, msym(oh) jitter(2)), ///
legend(off)

This has worked out pretty good, so we will go ahead and center socst at two
addition values, low, the mean - 1 sd and high, the mean + 1 sd.
/* create variable soc2, centered at mean + 1 sd of socst */
generate soc2 = socst-($mean + $sd)
xi3: regress read cmath*soc2
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 78.61
Model | 11424.7622 3 3808.25406 Prob > F = 0.0000
Residual | 9494.65781 196 48.4421317 R-squared = 0.5461
-------------+------------------------------ Adj R-squared = 0.5392
Total | 20919.42 199 105.122714 Root MSE = 6.96
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cmath | .6017615 .0774773 7.77 0.000 .4489654 .7545576
soc2 | .3738294 .0555457 6.73 0.000 .2642855 .4833733
_IcmXso | .0112807 .0052294 2.16 0.032 .0009677 .0215938
_cons | 55.62868 .7894084 70.47 0.000 54.07186 57.18551
------------------------------------------------------------------------------
/* save slope and intercept as global macro variables */
global b2 = _b[cmath]
global c2 = _b[_cons]
/* create variable soc3, centered at mean - 1 sd of socst */
generate soc3 = socst-($mean - $sd)
xi3: regress read cmath*soc3
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 78.61
Model | 11424.7622 3 3808.25407 Prob > F = 0.0000
Residual | 9494.6578 196 48.4421316 R-squared = 0.5461
-------------+------------------------------ Adj R-squared = 0.5392
Total | 20919.42 199 105.122714 Root MSE = 6.96
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cmath | .3595465 .0917421 3.92 0.000 .178618 .5404749
soc3 | .3738294 .0555457 6.73 0.000 .2642855 .4833733
_IcmXso | .0112807 .0052294 2.16 0.032 .0009677 .0215938
_cons | 47.60197 .8572345 55.53 0.000 45.91138 49.29256
------------------------------------------------------------------------------
/* save slope and intercept as global macro variables */
global b3 = _b[cmath]
global c3 = _b[_cons]
You will note that the slope and intercepts differ in these last two models. Below,
we present a summary of the slopes and intercepts for cmath at the the three
centered values of socst. In the summary, we do not include the coefficients
for socst nor the interaction because the values are the same for all three
of the centered models.
What the summary table shows us is that as socst increases from one standard deviation below the mean, through the mean and to one sd above the mean that slope of cmath on read gets larger as does the intercept. Below is a graph that includes all three regression lines along with a scatterplot of the data.display as txt "regression equations for math on read at 3 values of socst" display as txt "at +1sd: $c2 + $b2*math" display as txt "at mean: $c1 + $b1*math" display as txt "at -1sd: $c3 + $b3*math" regression equations for math on read at 3 values of socst at +1sd: 55.62868289662283 + .6017614777967648*math at mean: 51.61532731557158 + .4806539733117885*math at -1sd: 47.60197182980541 + .3595464625651616*math
twoway (function $c2 + $b2*x, range(-20 20)) ///
(function $c1 + $b1*x, range(-20 20)) ///
(function $c3 + $b3*x, range(-20 20)) ///
(scatter read cmath, msym(oh) jitter(2)), ///
legend(order(1 "+1sd" 2 "mean" 3 "-1sd"))

In the graph above, the top line represents the regression of read on math
when the value of socst is held constant at one standard deviation above its
mean. The middle line represents the regression when socst is held constant at
its mean value. And the bottom line is for the case when socst is held constant
one standard deviation below the mean.There is also a way to compute the slope coefficients and constants without centering either of the predictor variables using the lincom command. We will use the global macro variables that have already been created but we will need to rerun the original regression model.
xi3: regress read math*socst
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 78.61
Model | 11424.7622 3 3808.25406 Prob > F = 0.0000
Residual | 9494.65783 196 48.4421318 R-squared = 0.5461
-------------+------------------------------ Adj R-squared = 0.5392
Total | 20919.42 199 105.122714 Root MSE = 6.96
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math | -.1105123 .2916338 -0.38 0.705 -.6856552 .4646307
socst | -.2200442 .2717539 -0.81 0.419 -.7559812 .3158928
_ImaXso | .0112807 .0052294 2.16 0.032 .0009677 .0215938
_cons | 37.84271 14.54521 2.60 0.010 9.157506 66.52792
------------------------------------------------------------------------------
/* high: at mean plus 1 sd */
lincom math + ($mean+$sd)*_ImaXso
( 1) math + 63.14079 _ImaXso = 0
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | .6017615 .0774773 7.77 0.000 .4489654 .7545576
------------------------------------------------------------------------------
lincom _cons + ($mean+$sd)*socst + ($mean+$sd)*($mean2)*_ImaXso + ($mean2)*math
( 1) 52.645 math + 63.14079 socst + 3324.047 _ImaXso + _cons = 0
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 55.62868 .7894084 70.47 0.000 54.07186 57.18551
------------------------------------------------------------------------------
/* medium: at the mean */
lincom math + ($mean)*_ImaXso
( 1) math + 52.405 _ImaXso = 0
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | .480654 .0637009 7.55 0.000 .3550268 .6062811
------------------------------------------------------------------------------
lincom _cons + ($mean)*socst + ($mean)*($mean2)*_ImaXso + ($mean2)*math
( 1) 52.645 math + 52.405 socst + 2758.861 _ImaXso + _cons = 0
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 51.61533 .5686851 90.76 0.000 50.4938 52.73685
------------------------------------------------------------------------------
/* low: at mean minus 1 sd */
lincom math + ($mean-$sd)*_ImaXso
( 1) math + 41.66921 _ImaXso = 0
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | .3595465 .0917421 3.92 0.000 .178618 .5404749
------------------------------------------------------------------------------
lincom _cons + ($mean-$sd)*socst + ($mean-$sd)*($mean2)*_ImaXso + ($mean2)*math
( 1) 52.645 math + 41.66921 socst + 2193.675 _ImaXso + _cons = 0
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | 47.60197 .8572345 55.53 0.000 45.91138 49.29256
------------------------------------------------------------------------------
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services