UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Library
How do I handle interactions of continuous and categorical variables?

Analysis of covariance (ANCOVA) is a statistical procedure that allows you to include both categorical and continuous variables in a single model. ANCOVA assumes that the regression coefficients are homogeneous (the same) across the categorical variable. Violation of this assumption can lead to incorrect conclusions. This page will explore what happens when you have heterogeneous (different) regressions across groups and show some strategies for dealing with them. This involves some complex topics in the use of xi3, which is an ado program that you can download from within Stata by typing findit xi3 (see How can I use the findit command to search for programs and get additional help? for more information about using findit).

Here is an example data file we will use. It contains 30 subjects who used one of three diets, diet 1 (diet=1), diet 2 (diet=2) and a control group (diet=3). Before the start of the study, the height of the subject was measured, and after the study the weight of the subject was measured.
input id diet height weight 
1 1 56 140 
2 1 60 155 
3 1 64 143 
4 1 68 161 
5 1 72 139 
6 1 54 159 
7 1 62 138 
8 1 65 121 
9 1 65 161 
10 1 70 145 
11 2 56 117 
12 2 60 125 
13 2 64 133 
14 2 68 141 
15 2 72 149 
16 2 54 109 
17 2 62 128 
18 2 65 131 
19 2 65 131 
20 2 70 145 
21 3 54 211 
22 3 58 223 
23 3 62 235 
24 3 66 247 
25 3 70 259 
26 3 52 201 
27 3 59 228 
28 3 64 245 
29 3 65 241 
30 3 72 269 
end

1. A standard ANOVA

You could analyze these data with a standard ANOVA, as shown below. This analysis compares the weights of the three groups. We use the char command to compare the two diets (1 and 2) to the control group (diet 3).  We also want to compare diet 1 with diet 2. Because this set of comparisons does not correspond to any type of coding system (such as Helmert coding or backward difference coding), we need to use the user-defined coding option in xi3. This is specified with u. immediately preceding the variable that we want coded, in this case, diet. To set up the coding that we want, we use the char command (short for "characteristic") to set the characteristics of the user-defined option. To use the char command, we give the variable name in square brackets after the command and then list the contrasts within normal (i.e., round) brackets.  If more than one comparison is being specified, each comparison must be separated by a backward slash. Stata will remember this user-defined coding system until the end of the session or until the user-defined coding system is redefined. For more information on the use of the xi3 and the char commands, type help xi3.

One further note regarding the use of the user-defined coding option in xi3: although the significance test(s) associated with the contrasts will be the same whether, say, .5 .5 -1 is used or 1 1 -2, the coefficients will have different meanings.  If .5 .5 -1 is used, then the mean of the first two groups will be compared to the third group.  If 1 1 -2 is used, then the means of the first two groups will be summed and compared to twice the mean of the third group.
char diet[user] (.5 .5 -1 \ 1 -1 0 )
xi3: regress weight u.diet

u.diet            _Idiet_1-3          (naturally coded; _Idiet_3 omitted)

      Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  2,    27) =  128.48
       Model |    64350.60     2    32175.30           Prob > F      =  0.0000
    Residual |     6761.40    27  250.422222           R-squared     =  0.9049
-------------+------------------------------           Adj R-squared =  0.8979
       Total |    71112.00    29  2452.13793           Root MSE      =  15.825

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Idiet_1 |     -97.35   6.128893   -15.88   0.000    -109.9254   -84.77455
    _Idiet_2 |       15.3   7.077036     2.16   0.040     .7791209    29.82088
       _cons |        171   2.889188    59.19   0.000     165.0719    176.9281
------------------------------------------------------------------------------
The ANOVA results show an overall difference among all of the diets (F = 128.48, p = .0000) and the contrasts show a difference between the control group and the two diets (t = -15.88, p = .000), and a difference between diet 1 and diet 2 (t = 2.16, p = .04).  The ANOVA disregards the information that we have about the subject's height. As height is probably correlated with weight, this could be useful as a covariate in an ANCOVA.

2. A standard ANCOVA

Below we perform a standard ANCOVA.
xi3: regress weight u.diet height
adjust height, by(diet)
The results are consistent with those of the ANOVA. There is an overall effect of diet. Also, the control group is significantly different from the two diets, and diet 1 is different from diet 2. The significance level for the comparison of diet 1 versus diet 2 is smaller than the standard ANOVA.
u.diet            _Idiet_1-3          (naturally coded; _Idiet_3 omitted)

      Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  3,    26) =  157.80
       Model |  67409.8111     3   22469.937           Prob > F      =  0.0000
    Residual |  3702.18893    26  142.391882           R-squared     =  0.9479
-------------+------------------------------           Adj R-squared =  0.9419
       Total |    71112.00    29  2452.13793           Root MSE      =  11.933

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Idiet_1 |  -99.82052    4.65219   -21.46   0.000    -109.3832   -90.25781
    _Idiet_2 |       15.3   5.336514     2.87   0.008      4.33064    26.26936
      height |   1.764658   .3807136     4.64   0.000     .9820899    2.547226
       _cons |   59.59126   24.13426     2.47   0.020     9.982587    109.1999
------------------------------------------------------------------------------
------------------------------------------------------------------------------
     Dependent variable: weight     Command: regress
   Variables left as is: _Idiet_1, _Idiet_2
  Covariate set to mean: height = 63.133335
------------------------------------------------------------------------------

----------------------
     diet |         xb
----------+-----------
        1 |    145.376
        2 |    130.076
        3 |    237.547
----------------------
     Key:  xb  =  Linear Prediction
We can see that the coefficient (slope) between height and weight is 1.76. Figure 1 below shows the scatterplot between height and weight and the line of best fit with slope 1.76.
graph twoway (scatter weight height) (lfit weight height)
Figure 1. Scatterplot of weight by height with overall regression line

3. Estimate slopes for each diet group

One assumption of ANCOVA is that the slope between height and weight is the same for the three diet groups. This is called the homogeneity of regression assumption. Below we show a scatterplot like the one above; however, this one shows the three diet groups in different colors and shows a separate regression line for each diet group (diet 1=blue, diet 2=yellow, diet 3=red). As you can see, the blue regression line looks like it has a very different slope from the other two regression lines.
quietly xi3: regress weight g.diet*height 
predict yhat
quietly separate yhat, by(diet)
graph twoway scatter weight yhat1 yhat2 yhat3 height, ///
	connect(i l l l) msymbol(o i i i) sort
Figure 2. Scatterplot of weight by height with separate regression lines for each group (diet 1=blue, diet 2=yellow, diet 3=red)

Below we perform an analysis that shows the slopes of each of the lines. Even if we found the slope between height and weight to be 0 in the prior analysis, this is still a useful analysis to perform. It is possible that the overall slope for the entire sample was 0, but the slopes for some groups were positive and the others were negative and they cancelled each other out. This analysis would help you see if such a pattern was occurring.

bysort diet: regress weight height
We indeed see below that the slopes seem very different. (Note that the output has been abbreviated.)  The slope for diet 1 (-.37) is much smaller than the slope for diet 2 (2.095) and the control group, diet=3 (3.189).  We need to check into this further and test whether these slopes are significantly different from each other.
_______________________________________________________________________________
-> diet = 1

      Source |       SS       df       MS              Number of obs =      10
-------------+------------------------------           F(  1,     8) =    0.24
       Model |   42.657257     1   42.657257           Prob > F      =  0.6396
    Residual |  1440.94274     8  180.117843           R-squared     =  0.0288
-------------+------------------------------           Adj R-squared = -0.0927
       Total |     1483.60     9  164.844444           Root MSE      =  13.421

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      height |  -.3768309   .7743341    -0.49   0.640    -2.162449    1.408787
       _cons |   170.1664   49.43018     3.44   0.009     56.18024    284.1526
------------------------------------------------------------------------------

_______________________________________________________________________________
-> diet = 2

      Source |       SS       df       MS              Number of obs =      10
-------------+------------------------------           F(  1,     8) =  359.81
       Model |  1319.56112     1  1319.56112           Prob > F      =  0.0000
    Residual |  29.3388815     8  3.66736019           R-squared     =  0.9782
-------------+------------------------------           Adj R-squared =  0.9755
       Total |     1348.90     9  149.877778           Root MSE      =   1.915

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      height |   2.095872    .110491    18.97   0.000      1.84108    2.350665
       _cons |   -2.39747   7.053272    -0.34   0.743    -18.66234     13.8674
------------------------------------------------------------------------------

_______________________________________________________________________________
-> diet = 3

      Source |       SS       df       MS              Number of obs =      10
-------------+------------------------------           F(  1,     8) =  669.93
       Model |  3882.53627     1  3882.53627           Prob > F      =  0.0000
    Residual |  46.3637317     8  5.79546646           R-squared     =  0.9882
-------------+------------------------------           Adj R-squared =  0.9867
       Total |     3928.90     9  436.544444           Root MSE      =  2.4074

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      height |   3.189727   .1232367    25.88   0.000     2.905543    3.473912
       _cons |   37.49895   7.703032     4.87   0.001     19.73573    55.26218
------------------------------------------------------------------------------

4. Test equality of slopes across diet groups

We can test to see if the slopes for the three diet groups are equal, as shown below. The diet*height effect tests if the three slopes are equal. We use the g. option with xi3 to indicate that we want diet to be coded using simple effect coding.
xi3: regress weight g.diet*height 

g.diet            _Idiet_1-3          (naturally coded; _Idiet_1 omitted)

      Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  5,    24) =  220.26
       Model |  69595.3546     5  13919.0709           Prob > F      =  0.0000
    Residual |  1516.64541    24  63.1935587           R-squared     =  0.9787
-------------+------------------------------           Adj R-squared =  0.9742
       Total |       71112    29  2452.13793           Root MSE      =  7.9494

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Idiet_2 |  -172.5639   41.40619    -4.17   0.000    -258.0221   -87.10574
    _Idiet_3 |  -132.6675   38.78455    -3.42   0.002    -212.7149    -52.6201
      height |   1.636256   .2552408     6.41   0.000     1.109465    2.163047
    _Idi2Xhe |   2.472703   .6486365     3.81   0.001     1.133983    3.811423
    _Idi3Xhe |   3.566558   .6131609     5.82   0.000     2.301056     4.83206
       _cons |   68.42264   16.19835     4.22   0.000      34.9909    101.8544
------------------------------------------------------------------------------
The diet*height (coded as _Idi2Xhe and _Idi3Xhe) effect is indeed significant, indicating that the slopes do differ across the three diet groups.
test _Idi2Xhe _Idi3Xhe

 ( 1)  _Idi2Xhe = 0.0
 ( 2)  _Idi3Xhe = 0.0

       F(  2,    24) =   17.29
            Prob > F =    0.0000

5. Perform tests with separate slopes for all diet groups

Because the slopes for the three diet groups are not the same, we should not use a traditional ANCOVA model that assumes the slopes for the three diet groups are the same.  Instead, we can use a model that estimates separate slopes for all three diet groups. Because the diet groups will have different slopes, we must be very cautious in interpreting adjusted means. One way of thinking about this is to focus on the fact that we have a diet*height interaction. This means that we cannot interpret the relationship between height and weight without referring to diet. Likewise, if we want to talk about the effect of diet we need to specify what height we are talking about. For example, in comparing diets 1 and 2 (in Figure 2) it looks like there is no difference between diets 1 and 2 (blue and yellow) for tall people, but there may be a difference for shorter people. Below, we will see how to make these comparisons.

5.1 Comparing diet 1 with diet 2

Let us compare diet 1 versus diet 2 at three different levels of height, for those who are 59 inches tall, 64 inches and 68 inches tall. These correspond to the 25th, 50th and 75th percentiles for height. We can then evaluate separately for each height group the difference between diet 1 and diet 2. The model used in this analysis is the same as the model from section 4 where we estimated separate slopes. In addition we use the lincom command for comparing the diets 1 and 2 at the three levels of height, and for obtaining the adjusted mean for weight.

The first three lincom commands compare diet 1 with diet 2 at 59, 64, and 68 inches. The tablist command is used to display the values of _Idiet_1 and _Idiet_2. These are needed to construct the next six comparisons. The next three lincom commands request the predicted value of weight for people on diet 1 who are 59 inches, 64 inches, and 68 inches tall. The next three lincom commands requests the weight for people on diet 2 who are 59 inches, 64 inches, and 68 inches tall. Comments have been inserted in the code to help make clear what each set of lincom commands is doing.
char diet[user] (-1 1 0\.5 .5 -1)
xi3:  regress weight u.diet*height

u.diet            _Idiet_1-3          (naturally coded; _Idiet_3 omitted)

      Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  5,    24) =  220.26
       Model |  69595.3546     5  13919.0709           Prob > F      =  0.0000
    Residual |  1516.64539    24  63.1935578           R-squared     =  0.9787
-------------+------------------------------           Adj R-squared =  0.9742
       Total |       71112    29  2452.13793           Root MSE      =  7.9494

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Idiet_1 |  -172.5639   41.40619    -4.17   0.000    -258.0221   -87.10574
    _Idiet_2 |   46.38553    32.7967     1.41   0.170    -21.30352    114.0746
      height |   1.636256   .2552408     6.41   0.000     1.109465    2.163047
    _Idi1Xhe |   2.472703   .6486366     3.81   0.001     1.133983    3.811423
    _Idi2Xhe |  -2.330207    .520369    -4.48   0.000    -3.404196   -1.256218
       _cons |   68.42264   16.19835     4.22   0.000      34.9909    101.8544
------------------------------------------------------------------------------

* diet 1 vs. 2 at height=59, 64, 68
lincom  _Idiet_1 + 59* _Idi1Xhe

 ( 1)  _Idiet_1 + 59.0 _Idi1Xhe = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -26.67443   4.641265    -5.75   0.000    -36.25354   -17.09533
------------------------------------------------------------------------------

lincom  _Idiet_1 + 64* _Idi1Xhe

 ( 1)  _Idiet_1 + 64.0 _Idi1Xhe = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -14.31092   3.564552    -4.01   0.001    -21.66779   -6.954046
------------------------------------------------------------------------------

lincom  _Idiet_1 + 68* _Idi1Xhe

 ( 1)  _Idiet_1 + 68.0 _Idi1Xhe = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -4.420107   4.558951    -0.97   0.342    -13.82932    4.989105
------------------------------------------------------------------------------
To conduct the next six tests, which request the predicted value of weight for people on diet 1 at each height and the weight for people on diet 2 at each height, you will need to provide the equation to be used. To do this, you first need to include the constant (which is called _cons in Stata). Next, you need to multiply the variable by the value shown in the output from the tablist. You will also need to multiply the variable height by the current height. Hence, for the first test, you would multiply _Idiet_1 and _Idiet_2 by the values from the first row of the tablist output (because we are looking at the weights of people on diet 1), and you would multiply the variable height and and the height*diet interaction (i.e., the variable called _Idi2Xhe) by 59 because we are looking only at people who are 59 inches tall. The other five lincom commands are constructed using the same logic.
tablist diet _Idiet_1 _Idiet_2

diet   _Idiet_1   _Idiet_2   Freq
 1        -.5     .3333333     10
 2         .5     .3333333     10
 3          0    -.6666667     10
 
* weight for diet 1 at height=59, 64, 68
lincom _cons + -.5* _Idiet_1 + .3333* _Idiet_2 + 59* height + (-.5*59)* _Idi1Xhe + (.3333*59)* _Idi2Xhe

( 1) - .5 _Idiet_1 + .3333 _Idiet_2 + 59.0 height - 29.5 _IdieXheigh_1 + 19.6647 _IdieXheigh_2 + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   147.9365   3.281816    45.08   0.000     141.1631    154.7098
------------------------------------------------------------------------------

lincom _cons + -.5* _Idiet_1 + .3333* _Idiet_2 + 64* height + (-.5*64)* _Idi1Xhe + (.3333*64)* _Idi2Xhe

 ( 1) - .5 _Idiet_1 + .3333 _Idiet_2 + 64.0 height - 32.0 _Idi1Xhe + 21.3312 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   146.0527   2.520477    57.95   0.000     140.8507    151.2547
------------------------------------------------------------------------------

lincom _cons + -.5* _Idiet_1 + .3333* _Idiet_2 + 68* height + (-.5*68)* _Idi1Xhe + (.3333*68)* _Idi2Xhe

 ( 1) - .5 _Idiet_1 + .3333 _Idiet_2 + 68.0 height - 34.0 _Idi1Xhe + 22.6644 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   144.5457   3.223611    44.84   0.000     137.8925    151.1989
------------------------------------------------------------------------------

* weight for diet 2 at height=59, 64, 68
lincom _cons + .5* _Idiet_1 + .3333* _Idiet_2 + 59* height + (.5*59)* _Idi1Xhe + (.3333*59)* _Idi2Xhe

 ( 1)  .5 _Idiet_1 + .3333 _Idiet_2 + 59.0 height + 29.5 _Idi1Xhe + 19.6647 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    121.262   3.281816    36.95   0.000     114.4887    128.0354
------------------------------------------------------------------------------

lincom _cons + .5* _Idiet_1 + .3333* _Idiet_2 + 64* height + (.5*64)* _Idi1Xhe + (.3333*64)* _Idi2Xhe

 ( 1)  .5 _Idiet_1 + .3333 _Idiet_2 + 64.0 height + 32.0 _Idi1Xhe + 21.3312 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   131.7418   2.520477    52.27   0.000     126.5398    136.9438
------------------------------------------------------------------------------

lincom _cons + .5* _Idiet_1 + .3333* _Idiet_2 + 68* height + (.5*68)* _Idi1Xhe + (.3333*68)* _Idi2Xhe

 ( 1)  .5 _Idiet_1 + .3333 _Idiet_2 + 68.0 height + 34.0 _Idi1Xhe + 22.6644 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   140.1256   3.223611    43.47   0.000     133.4724    146.7788
------------------------------------------------------------------------------
Focusing on the comparison of diets 1 and 2, these results indicate a significant difference between diet 1 and diet 2 for those 59 inches tall (t=-5.75, p < .0001) and a significant difference for those 64 inches tall (t=-4.01, p=0.0005).  For those who are tall (i.e., 68 inches), diet 1 and diet 2 are about equally effective.  This corresponds with what we saw in Figure 2.

You will notice that if you take the parameter estimate for "weight for diet 1 at 59 inches" minus the parameter estimate for "weight for diet 2 at 59 inches", you get -26.67, which is the parameter estimate for "diet 1 versus. 2 at 59 inches" (147.93 - 121.25 = -26.67). Likewise, taking the parameter estimate for "weight for diet 1 at 64 inches" minus the parameter estimate for "weight for diet 2 at 64 inches" yields the parameter estimate for "diet 1 versus 2 at 64 inches" (146.04-131.73 = -14.31).  You can do a similar computation for the weights for those 68 inches tall.

5.2 Comparing diets 1 and 2 to the control group

The analysis below compares diets 1 and 2 to the control group (group 3) at the three different heights: 59 inches, 64 inches and 68 inches. The first three lincom commands compare diets 1 and 2 to the control group at these three different heights. The next three lincom commands estimate the weight for the diet 1 and diet 2 groups combined at the three heights. The following lincom commands estimate the weight for the control group at the three heights.
* diet 1 & 2 vs. 3 at height=59, 64, 68
lincom  _Idiet_2 + 59* _Idi2Xhe

 ( 1)  _Idiet_2 + 59.0 _Idi2Xhe = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -91.09666   3.660663   -24.89   0.000     -98.6519   -83.54143
------------------------------------------------------------------------------

lincom  _Idiet_2 + 64* _Idi2Xhe

 ( 1)  _Idiet_2 + 64.0 _Idi2Xhe = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -102.7477   3.167398   -32.44   0.000    -109.2849   -96.21051
------------------------------------------------------------------------------

lincom  _Idiet_2 + 68* _Idi2Xhe

 ( 1)  _Idiet_2 + 68.0 _Idi2Xhe = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -112.0685   4.133546   -27.11   0.000    -120.5997   -103.5373
------------------------------------------------------------------------------

* weight for (diet 1+diet2) / 2 at height=59, 64, 68
lincom _cons + .33333* _Idiet_2 + 59* height + 59*.33333* _Idi2Xhe

 ( 1)  .33333 _Idiet_2 + 59.0 height + 19.66647 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   134.5965   2.320625    58.00   0.000      129.807     139.386
------------------------------------------------------------------------------

lincom _cons + .33333* _Idiet_2 + 64* height + 64*.33333* _Idi2Xhe

 ( 1)  .33333 _Idiet_2 + 64.0 height + 21.33312 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   138.8942    1.78227    77.93   0.000     135.2157    142.5726
------------------------------------------------------------------------------

lincom _cons + .33333* _Idiet_2 + 68* height + 68*.33333* _Idi2Xhe

 ( 1)  .33333 _Idiet_2 + 68.0 height + 22.66644 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   142.3323   2.279468    62.44   0.000     137.6277    147.0369
------------------------------------------------------------------------------
 
* weight for diet3 at height=59, 64, 68
lincom _cons + -.66667* _Idiet_2 + 59* height + 59*-.66667* _Idi2Xhe

 ( 1) - .66667 _Idiet_2 + 59.0 height - 39.33353 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   225.6932   2.831107    79.72   0.000     219.8501    231.5363
------------------------------------------------------------------------------

lincom _cons + -.66667* _Idiet_2 + 64* height + 64*-.66667* _Idi2Xhe

 ( 1) - .66667 _Idiet_2 + 64.0 height - 42.66688 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   241.6419   2.618387    92.29   0.000     236.2378    247.0459
------------------------------------------------------------------------------

lincom _cons + -.66667* _Idiet_2 + 68* height + 68*-.66667* _Idi2Xhe

 ( 1) - .66667 _Idiet_2 + 68.0 height - 45.33356 _Idi2Xhe + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   254.4008   3.448227    73.78   0.000      247.284    261.5176
------------------------------------------------------------------------------
The output indicates the difference in weight between diet groups 1 and 2 combined and the control group is -91.09666 pounds at 59 inches, and this difference is significant. We could obtain that difference by taking 134.59 (the average for diet groups 1 and 2 at 59 inches) minus 225.69 (the average for diet group 3 at 59 inches). Likewise, the difference between diet groups 1 and 2 versus diet group 3 is significant at 64 inches (with a difference of -102.7477 pounds) and at 68 inches (with a difference of -112.0685 pounds). Despite the interaction, the control group (diet 3) always weighs more than the two diet groups combined. This is consistent with what we saw in figure 2.

6. Testing to pool slopes

You may have noticed that the slope for diet group 1 was quite different from 2 and 3, but 2 and 3 were not so different from each other (see the graph from figure 2 and output in section 4)  Rather than estimating three separate slopes, maybe it would be better if we estimated a slope for diet group 1, and one combined slope for diet groups 2 and 3. Let's compare the slopes for diet groups 2 and 3 to see if they are different (and if they are not different they can be combined), and also test to see if the slope for diet group 1 is really different from the combined slopes for diet groups 2 and 3.
char diet[user] (1 -.5 -.5\0 1 -1)
xi3:  regress weight u.diet*height

u.diet            _Idiet_1-3          (naturally coded; _Idiet_3 omitted)

      Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  5,    24) =  220.26
       Model |  69595.3546     5  13919.0709           Prob > F      =  0.0000
    Residual |   1516.6454    24  63.1935584           R-squared     =  0.9787
-------------+------------------------------           Adj R-squared =  0.9742
       Total |       71112    29  2452.13793           Root MSE      =  7.9494

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Idiet_1 |   152.6157   35.11832     4.35   0.000     80.13504    225.0963
    _Idiet_2 |  -39.89642   38.78455    -1.03   0.314    -119.9438    40.15097
      height |   1.636256   .2552408     6.41   0.000     1.109465    2.163047
    _Idi1Xhe |  -3.019631   .5516849    -5.47   0.000    -4.158252   -1.881009
    _Idi2Xhe |  -1.093855   .6131609    -1.78   0.087    -2.359357    .1716465
       _cons |   68.42264   16.19835     4.22   0.000      34.9909    101.8544
------------------------------------------------------------------------------
As we expected, the test comparing the slopes of diet group 1 versus 2 and 3 was significant, and the test comparing the slopes for diet groups 2 versus 3 was not significant. Because the slopes for diet groups 2 and 3 do not significantly differ, we can simplify our model by including one slope for diet group 1, and one combined slope for diet groups 2 and 3. This model has two benefits: 1) The estimate of the slope for diet groups 2 and 3 will be more stable (because it is based on more cases) than slopes computed separately. Second, as we will see later, comparisons between diet groups 2 and 3 are greatly simplified since they will have a common slope.

7. Perform tests with some pooled slopes

7.1 Overall analysis pooling slopes for diet groups 2 and 3

Let's see how we can can estimate a model with one slope for diet group 1, and another slope for diet groups 2 and 3. First, we will make a dummy variable that is 0 for diet group 1, and 1 for diet groups 2 and 3, called diet23.
gen diet1 = 1 if diet == 1
replace diet1 = 0 if inlist(diet,2,3)
 
tab2 diet diet1
The diet23 variable has been created successfully.
-> tabulation of diet by diet1  

           |         diet1
      diet |         0          1 |     Total
-----------+----------------------+----------
         1 |         0         10 |        10 
         2 |        10          0 |        10 
         3 |        10          0 |        10 
-----------+----------------------+----------
     Total |        20         10 |        30
Now, we can use diet23 in our model. The variable diet (i.e., the variables _Idiet_1 and _Idiet_2) is included to indicate the mean differences among the three different diet groups, and diet23*height is used to indicate that we want to estimate two slopes.
* generate coding scheme for diet using user defined coding
char diet[user] (1 -1 0\0 1 -1)
xi3 u.diet

u.diet            _Idiet_1-3          (naturally coded; _Idiet_3 omitted)

* create 2 level diet 1 vs 23 by height interaction
gen dt1ht = diet1*height

* run regression with diet height diet1vs23*height
regress weight _Idiet_1 _Idiet_2 height dt1ht
Notice that diet has 2 df (since it has three levels) but the interaction of diet23*height has only 1 df (since diet23 has only two levels), whereas in section 4 the diet*height interaction had 2 df (since diet has three levels).
      Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  4,    25) =  252.49
       Model |    69394.24     4    17348.56           Prob > F      =  0.0000
    Residual |  1717.75999    25  68.7103995           R-squared     =  0.9758
-------------+------------------------------           Adj R-squared =  0.9720
       Total |    71112.00    29  2452.13793           Root MSE      =  8.2892

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    _Idiet_1 |     211.49   36.69425     5.76   0.000     135.9168    287.0632
    _Idiet_2 |  -108.7911    3.73357   -29.14   0.000    -116.4805   -101.1017
      height |   2.707918   .3174089     8.53   0.000     2.054202    3.361634
       dt1ht |  -3.084749   .5740018    -5.37   0.000    -4.266928    -1.90257
       _cons |   65.43679   16.80021     3.89   0.001     30.83611    100.0375
------------------------------------------------------------------------------

7.2 Comparing diet groups 1 and 2 when pooling slopes for diet groups 2 and 3

Even though we have pooled the slopes for groups 2 and 3, when we want to compare groups 1 and 2 we are comparing across groups with different slopes so we still need to use lincom to compare the diets at the different levels of heights and obtain the adjusted means. The first three lincom commands below compare diet groups 1 with 2 at the three levels of height (59, 64 and 68 inches). The next three lincom commands obtain adjusted means for diet 1 at the three heights, and the next three lincom commands obtain adjusted means for diet 2 at the three heights.
lincom  _Idiet_1 + 59* dt1ht

 ( 1)  _Idiet_1 + 59.0 dt1ht = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   29.48984   4.551245     6.48   0.000     20.11637     38.8633
------------------------------------------------------------------------------

lincom  _Idiet_1 + 64* dt1ht

 ( 1)  _Idiet_1 + 64.0 dt1ht = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   14.06609   3.714135     3.79   0.001     6.416691     21.7155
------------------------------------------------------------------------------

lincom  _Idiet_1 + 68* dt1ht

 ( 1)  _Idiet_1 + 68.0 dt1ht = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   1.727099   4.485619     0.39   0.703    -7.511207     10.9654
------------------------------------------------------------------------------

tablist  diet _Idiet_1 _Idiet_2

diet   _Idiet_1   _Idiet_2   Freq
 1    .6666667   .3333333     10
 2   -.3333333   .3333333     10
 3   -.3333333  -.6666667     10

* weight for diet 1 at height=59, 64, 68
lincom _cons + .6667* _Idiet_1 + .3333* _Idiet_2 + 59* height + (1*59)*dt1ht

 ( 1)  .6667 _Idiet_1 + .3333 _Idiet_2 + 59.0 height + 59.0 dt1ht + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   147.9441   3.422846    43.22   0.000     140.8946    154.9936
------------------------------------------------------------------------------

lincom _cons + .6667* _Idiet_1 + .3333* _Idiet_2 + 64* height + (1*64)*dt1ht

 ( 1)  .6667 _Idiet_1 + .3333 _Idiet_2 + 64.0 height + 64.0 dt1ht + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   146.0599   2.628252    55.57   0.000      140.647    151.4729
------------------------------------------------------------------------------

lincom _cons + .6667* _Idiet_1 + .3333* _Idiet_2 + 68* height + (1*68)*dt1ht

 ( 1)  .6667 _Idiet_1 + .3333 _Idiet_2 + 68.0 height + 68.0 dt1ht + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   144.5526   3.360869    43.01   0.000     137.6308    151.4745
------------------------------------------------------------------------------

* weight for diet 2 at height=59, 64, 68
lincom _cons + -.3333* _Idiet_1 + .3333* _Idiet_2 + 59* height + (0*59)*dt1ht

 ( 1) - .3333 _Idiet_1 + .3333 _Idiet_2 + 59.0 height + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   118.4543   2.999992    39.48   0.000     112.2757    124.6329
------------------------------------------------------------------------------

lincom _cons + -.3333* _Idiet_1 + .3333* _Idiet_2 + 64* height + (0*59)*dt1ht

 ( 1) - .3333 _Idiet_1 + .3333 _Idiet_2 + 64.0 height + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   131.9938   2.624199    50.30   0.000     126.5892    137.3985
------------------------------------------------------------------------------

lincom _cons + -.3333* _Idiet_1 + .3333* _Idiet_2 + 68* height + (0*59)*dt1ht

 ( 1) - .3333 _Idiet_1 + .3333 _Idiet_2 + 68.0 height + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   142.8255   2.970275    48.08   0.000     136.7081    148.9429
------------------------------------------------------------------------------
We can compare the results here with those of section 5.1 (which also compared groups 1 and 2, but estimated separate slopes for all three groups).  We see that the results are quite consistent, i.e., the difference between diet groups 1 and 2 are different at 59 inches, 64 inches, but not at 68 inches.

7.3 Comparing diet groups 2 and 3 when pooling slopes for diet groups 2 and 3

Because we have estimated a common slope for diet groups 2 and 3, it is easier to compare diet groups 2 and 3. Since the slopes for these two groups are parallel, we can compare these two groups at any value for height and the difference between the regression lines will remain constant. Hence, to compare diets 2 and 3, we only need diet 0 1 -1 in the lincom command.  To obtain traditional adjusted means for each diet, you would estimate the adjusted mean at the overall mean value of height (in this case 63.13) as shown below.
lincom  _Idiet_2 +  63.13*height

 ( 1)  _Idiet_2 + 63.13 height = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   62.15977   19.94124     3.12   0.005     21.09002    103.2295
------------------------------------------------------------------------------

lincom _cons + -.3333* _Idiet_1 + .3333* _Idiet_2 + 63.13* height + (0*63.13)*dt1ht

 ( 1) - .3333 _Idiet_1 + .3333 _Idiet_2 + 63.13 height + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    129.638   2.625295    49.38   0.000     124.2311    135.0449
------------------------------------------------------------------------------

lincom _cons + -.3333* _Idiet_1 + -.6667* _Idiet_2 + 63.13* height + (0*63.13)*dt1ht

 ( 1) - .3333 _Idiet_1 - .6667 _Idiet_2 + 63.13 height + _cons = 0.0

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |    238.429      2.638    90.38   0.000      232.996    243.8621
------------------------------------------------------------------------------
The comparison of diets 2 and 3 is significant, and this holds true across all levels of height. Those in diet group 2 weighed about 108.8 pounds less than those in diet group 3.  For those of average height, the adjusted mean for diet 2 was 129.6 and for diet 3 was 238.4 (and 129.6 - 238.4 = -108.8).

8. Summary

We have seen that in ANCOVA it is important to test the homogeneity of regression assumption, and if this assumption is violated we then need to estimate models that have separate slopes across groups.  This amounts to having an interaction between your covariate and your group variable, which means that when you estimate differences among the groups, you need to take the level of the covariate into consideration.  One strategy, as illustrated here, is to look at the effect of your group variable at different levels of your covariate.  In our example, when we compared the control group to diets 1 and 2, we found that the control group weighed more at 3 different levels of height (59 inches, 64 inches and 68 inches).  However, when we compared diets 1 and 2, we found diet 2 to be more effective at 59 and 64 inches, but there was no difference at 68 inches.  Had we not done this further investigation, we may have concluded that diet 1 was superior to diet 2 for people of all heights, not realizing that the effectiveness of the diet depended on height.

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California