UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Regression with Stata
Chapter 6: More on interactions of categorical variables
Draft version

This is a draft version of this chapter.  Comments and suggestions to improve this draft are welcome.

Chapter outline
    6.1. Analysis with two categorical variables
    6.2. Simple effects
      6.2.1 Analyzing simple effects using xi3 and regress
      6.2.2 Coding of simple effects
    6.3. Simple comparisons
      6.3.1 Analyzing simple comparisons using xi3 and regress
      6.3.2 Coding of simple comparisons
    6.4. Partial interaction
      6.4.1 Analyzing partial interactions using xi3 and regress
      6.4.2 Coding of partial interactions
    6.5. Interaction contrasts
      6.5.1 Analyzing interaction contrasts using xi3 and regress
      6.5.2 Coding of interaction contrasts
    6.6. Computing adjusted means
      6.6.1 Computing adjusted means via anova
      6.6.1 Computing adjusted means via regress
    6.7. More details on meaning of coefficients
    6.8. Simple effects via dummy coding versus effect coding
      6.8.1 Example 1. Simple effects of yr_rnd at levels of mealcat
      6.8.2 Example 2. Simple effects of mealcat at levels of yr_rnd

For this chapter we will use the elemapi2 data file that we have been using in prior chapters. We will focus on the variables mealcat, and collcat as they relate to the outcome variable api00 (performance on the api in the year 2000). The variable mealcat is the variable meals broken up into three categories, and the variable collcat is the variable some_col broken into 3 categories. We could think of mealcat as being the number of students receiving free meals and broken up into low, middle and high. The variable collcat can be thought of as the number of parents with some college education, and we could think of it as being broken up into low, medium and high. For our analysis, we think that both mealcat and collcat may be related to api00, but it is also possible that the impact of mealcat might depend on the level of collcat. In other words, we think that there might be an interaction of these two categorical variables. In this chapter we will look at how these two categorical variables are related to api performance in the school, and we will look at the interaction of these two categorical variables as well. We will see that there is an interaction of these categorical variables, and will focus on different ways of further exploring the interaction.

We will first use the elemapi2 data file.

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear

We will modify the label for mealcat in order to more clearly see some of the points we will be demonstrating later in this chapter.

label define mealcat 1 "1" 2 "2" 3 "3", modify

6.1. Analysis with 2 categorical variables

One traditional way to analyze this would be to perform a 3 by 3 factorial analysis of variance using the anova command, as shown below. The results show a main effect of collcat (F=4.5, p-0.0117), a main effect of mealcat (F=509.04, p=0.0000) and an interaction of collcat by mealcat, (F=6.63, p=0.0000).

anova api00 collcat mealcat collcat*mealcat 

We can use the adjust command to show the adjusted means broken down by collcat and mealcat.

adjust, by(collcat mealcat)

We can show a graph of the adjusted means as shown below. We use the separate command to make three variables corresponding to the three levels of collcat (i.e., yhat1 corresponds to the predicted value when collcat is low). We can then show the graph with the three levels of collcat represented as three separate lines.

predict yhat
separate yhat, by(collcat)
graph twoway scatter yhat1 yhat2 yhat3 mealcat, connect(l l l) xlabel(1 2 3) sort

Now we drop the variables yhat yhat1 yhat2 yhat3 in case we wish to use these variables later.

drop yhat yhat1 yhat2 yhat3

We can do these same analyses using the regress command. Below we use the regress command with xi3 to look at the effect of collcat, mealcat and the interaction of these two variables.

xi3: regress api00 g.collcat*g.mealcat

We use the test command to test the two terms associated with collcat to get the main effect of collcat.

test _Icollcat_2 _Icollcat_3

Likewise we use the test command to get the overall test of mealcat.

test _Imealcat_2 _Imealcat_3

Finally, we use the test command to test the interaction of of collcat by mealcat.

test _Ico2Xme2 _Ico2Xme3 _Ico3Xme2 _Ico3Xme3

First, note that the results of the test commands correspond to those from the anova command above. This is because collcat and mealcat were coded using simple effect coding, a coding scheme where the contrasts sum to 0. We indicated that we wanted simple effect coding by using g.collcat and g.mealcat on the regress command with xi3 (see Chapter 5 for more information about coding schemes available via the xi3 command). If this had been coded using dummy coding, e.g., i.collcat, then the results of the test commands for mealcat and somecat from the regress command would not have corresponded to the anova results. In addition to simple effect coding, we could have used e., h., r., a., b., or o. and the results of the test commands would have matched the anova command, although the meaning of the individual tests would have been different. This point will be explored in more detail later in this chapter.

We can obtain the adjusted means by using predict command to get the predicted values, calling them pred and then looking at the mean of pred broken down by collcat and mealcat.

predict pred
table collcat mealcat, contents(mean pred)

We can show a graph of cell means as shown below. We use the same strategy as we did in making the graph above.

separate pred, by(collcat)
graph twoway scatter pred1 pred2 pred3 mealcat, c(l l l) xlabel(1 2 3) sort

Now we drop the variables pred pred1 pred2 pred3 in case we wish to use these variable names later.

drop pred pred1 pred2 pred3

Note that we could have produced the same graph and table of predicted values using the postgr3 command. You can download postgr3 from within Stata by typing findit postgr3 (see How can I used the findit command to search for programs and get additional help? for more information about using findit).

postgr3 mealcat, by(collcat) table2 clpattern(solid dash dot)
Variables left asis: _Imealcat_2 _Imealcat_3 _Icollcat_2 _Icollcat_3 _IcolXmea_2_2 
  _IcolXmea_2_3 _IcolXmea_3_2 _IcolXmea_3_3
(option xb assumed; fitted values)
                          Means of Fitted values

           |  Percentage free meals in 3
           |          categories
   collcat |         1          2          3 |     Total
-----------+---------------------------------+----------
         1 | 816.91431  589.34998  493.91891 | 596.34884
         2 | 825.65118  636.60468  508.83334 | 651.50002
         3 | 782.15094   655.6377  541.73334 |  692.1095
-----------+---------------------------------+----------
     Total | 805.71757  639.39395  504.37956 | 647.62251

The graph of the cell means illustrates the interaction between collcat and mealcat. The graph shows the three  levels of collcat as three different lines, and the three levels of mealcat as the three values on the x-axis of the graph. We can see that the effect of collcat differs based on the level of mealcat. For example, when mealcat is low, schools where collcat is 3 have the lowest api00 scores, as compared to schools that are medium or high on mealcat, where schools with collcat of 3 have the highest api00 scores.

Let's investigate this interaction further by looking at the simple effects of collcat at each level of mealcat.

6.2. Simple effects

We found that the main effect of collcat was significant, but because we have an interaction the effect of collcat depends on the level of mealcat. We might want to ask whether the effect of collcat is significant at each level of mealcat.

6.2.1 Analyzing simple effects using xi3 and regress

In order to look at the simple effects of collcat at the different levels of mealcat, we will use the @ symbol instead of * to indicate that we want the interaction terms to reflect the simple effects of collcat at each level of mealcat. We will use helmert coding for collcat, which will be discussed further later.

xi3: regress api00 h.collcat@g.mealcat
h.collcat         _Icollcat_1-3       (naturally coded; _Icollcat_3 omitted)
g.mealcat         _Imealcat_1-3       (naturally coded; _Imealcat_1 omitted)

      Source |       SS       df       MS              Number of obs =     400
-------------+------------------------------           F(  8,   391) =  166.76
       Model |  6243714.81     8  780464.351           Prob > F      =  0.0000
    Residual |  1829957.19   391  4680.19741           R-squared     =  0.7733
-------------+------------------------------           Adj R-squared =  0.7687
       Total |     8073672   399  20234.7669           Root MSE      =  68.412

------------------------------------------------------------------------------
       api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 _Imealcat_2 |  -181.0414   9.077126   -19.94   0.000    -198.8874   -163.1953
 _Imealcat_3 |  -293.4103   9.449459   -31.05   0.000    -311.9884   -274.8322
   _Ico1Wme1 |   13.01323     13.528     0.96   0.337    -13.58349    39.60995
   _Ico1Wme2 |  -56.77117   16.67866    -3.40   0.001    -89.56223    -23.9801
   _Ico1Wme3 |  -31.36441   12.86955    -2.44   0.015    -56.66658   -6.062246
   _Ico2Wme1 |   43.50022   14.04092     3.10   0.002     15.89507    71.10536
   _Ico2Wme2 |  -19.03303   13.29175    -1.43   0.153    -45.16528     7.09922
   _Ico2Wme3 |      -32.9   20.23653    -1.63   0.105    -72.68603    6.886029
       _cons |   650.0883   3.871885   167.90   0.000     642.4759    657.7006
------------------------------------------------------------------------------

We can obtain the simple effect of collcat when mealcat is low (i.e., 1) via the test command below. This shows that the effect of collcat when mealcat is low is significant.

test _Ico1Wme1 _Ico2Wme1

We use the describe command below to see the meaning of these terms and see that these two terms represent the two comparisons on collcat when mealcat is 1. For example, in the term _Ico2Wme1, the 2 means that this is the second comparison on collcat and the 1 means that it is when mealcat is 1.

describe _Ico1Wme1 _Ico2Wme1

We can test the simple effect of collcat when mealcat is 2 via the test command below. This shows that collcat is significant when mealcat is 2.

test  _Ico1Wme2 _Ico2Wme2

We can also test the simple effect of collcat when mealcat is 3 via the test command below. This shows that collcat is significant when mealcat is 3, if we use an alpha level of 0.05. We should note that since we are doing a number of additional tests, you might want to consider using post hoc corrections, such as a bonferoni correction to avoid Type I errors.

test  _Ico1Wme3 _Ico2Wme3

In summary, all three of the simple effects of collcat at each level of mealcat were significant. However, the effect of collcat when mealcat was 3 might not be significant if we used a post hoc criteria for evaluating its significance.

6.2.2 Coding of simple effects

While xi3 creates the coding for you, it is useful to see the coding it creates for making these simple effects. The coding for mealcat used simple coding, and it's coding is just as we saw in chapter 5. Below we use the tablist command to show the coding for mealcat. You can download tablist from within Stata by typing findit tablist (see How can I used the findit command to search for programs and get additional help? for more information about using findit).

We see that the coding of mealcat is just as we would expect from chapter 5.

tablist mealcat  _Imealcat_2 _Imealcat_3, sort(v)

We requested helmert coding for collcat, and we can look at the coding of collcat to see that the terms _Icollcat_1 _Icollcat_2 are indeed coded using helmert coding. We should note that these terms are not used in the analysis, but are used by xi3 for creating the simple effects shown in the next section.

tablist collcat  _Icollcat_1 _Icollcat_2, sort(v)

Now that we have seen the helmert coding for collcat, we can see how this is used to create the simple effects of collcat at each level of mealcat. First, we look at the two comparisons of collcat at mealcat of 1. Note that the coding is the same as we saw above, but only when mealcat is 1, otherwise these variables are coded 0.

tablist  mealcat collcat _Ico1Wme1 _Ico2Wme1, sort(v)

Likewise, we look at the terms that form the effects of collcat when mealcat is 2, and we see that the variables are coded the same way when mealcat is 2, and otherwise 0.

tablist  mealcat collcat _Ico1Wme2 _Ico2Wme2, sort(v)

Finally, we see the same pattern for the terms that form the effect of collcat when mealcat is 3.

tablist  mealcat collcat _Ico1Wme3 _Ico2Wme3, sort(v)

This illustrates how xi3 codes the variables to allow the simple effects analysis. If you wished, you could manually create variables according to this strategy to perform a simple effects analysis.

3. Simple comparisons

In the analyses above we looked at the simple effect of collcat at each level of mealcat. For example, we looked at the overall effect of collcat when mealcat was 1. This is the simple effect of collcat at mealcat=1. Because collcat has more than two levels, we may wish to make further comparisons among the three levels of collcat within mealcat=1. Simple comparisons allow us to make such comparisons.

6.3.1 Analyzing Simple Comparisons Using xi3 and regress

In the analyses above we used helmert coding for collcat. We chose this coding so we could compare group 1 with groups 2 and 3 and then compare groups 2 and 3. For example, if we wanted to compare collcat 1 versus 2 and 3, we would want to look at the effect _Ico1Wme1, and if we wanted to compare collcat groups 2 and 3 when mealcat is 1, then we would look at the effect _Ico2Wme1. Because xi3 creates labels for each term that it creates, we can use the describe command to verify that we are using the correct terms. Indeed, we see that these terms are as we expected.

describe _Ico1Wme1 _Ico2Wme1

We can use the regress command to see the effects for these terms.

regress

We see that the collcat 1 is not significantly different from 2 and 3 at mealcat 1 (t=.96, p=.337), but collcat 2 is significantly different from collcat 3 at mealcat 1 (t=3.10, p=0.002).

6.3.2 Coding of Simple Comparisons

We can see that the coding of simple comparisons is the same as the coding of simple effects. For example, we can see that the coding of _Icollcat_1 and _Icollcat_2 is coded using helmert coding.

tablist collcat _Icollcat_1 _Icollcat_2, sort(v)

Then the term term _Ico1Wme1 represents the comparison of collcat 1 versus collcat 2 and 3 when mealcat is 1. Hence, the coding is the same as the coding for _Icollcat_1 when mealcat is 1, and 0 otherwise, see below.

tablist  mealcat collcat _Ico1Wme1, sort(v)

6.4. Partial interaction

A partial interaction allows you to apply contrasts to one of the effects in an interaction term. For example, we can draw the interaction of collcat by mealcat like this below.

  Collcat low Collcat Med Collcat High
Mealcat Low      
Mealcat Med      
Mealcat High      

Say that we wanted to compare, in the context of this interaction, group 1 for collcat versus groups 2 and 3. The table of this partial interaction would look like this.  The contrast coefficients of -2 1 1 applied to collcat indicate the comparison of  group 1 for collcat versus groups 2 and 3. 

  -2 1 1
  Collcat low Collcat Med Collcat High
Mealcat Low      
Mealcat Med      
Mealcat High      

Likewise, we also might want to compare groups 2 and 3 of collcat by mealcat, and the table of this interaction would look like this.

  0 -1 1
  Collcat low Collcat Med Collcat High
Mealcat Low      
Mealcat Med      
Mealcat High      

These are called partial interactions because contrast coefficients are applied to one of the terms involved in the interaction.

6.4.1 Analyzing partial interactions using xi3 and regress

As shown above, we wish to compare groups 1 versus 2 and 3 on collcat, and then compare groups 2 and 3 on collcat. This implies helmert coding on collcat, as shown below. The coding for mealcat is chosen as forward difference coding (for the purposes of later analyses) but could have been any form of effect coding.

xi3: regress api00 h.collcat*a.mealcat

Let's look at all of the terms created by the xi3 command using the describe command.

describe _I*

The partial interaction of collcat comparing groups 1 versus 2 and 3 by mealcat is composed of the interaction terms _Ico1Xme1 and _Ico1Xme2, because these are the terms from the interaction that compare groups 1 versus 2 and 3 on collcat. Below we use the test command to test this partial interaction. We find that this interaction is significant.

test _Ico1Xme1 _Ico1Xme2

Likewise to compare groups 2 and 3 on collcat by mealcat, we test the two terms of the interaction that involve the comparison of groups 2 and 3 on collcat. We find that this comparison is also significant.

test _Ico2Xme1 _Ico2Xme2

6.4.2 Coding of partial interactions

The terms _Ico1Xme1 and _Ico1Xme2 are just the product of their respective main effects. The coding for mealcat is really irrelevant, as long as some form of coding is used that sums to 0. Below you can see that _Ico1Xme1 is just _Icollcat_1 * _Imealcat_1.

tablist collcat mealcat _Icollcat_1 _Imealcat_1 _Ico1Xme1, sort(v)

And you can see that _Ico1Xme2 is just _Icollcat_1 * _Imealcat_2.

tablist collcat mealcat _Icollcat_1 _Imealcat_2 _Ico1Xme2, s(v)

6.5. Interaction contrasts

Above we saw that a partial interaction allows you to apply contrast coefficients to one of the terms in a two-way interaction. An interaction contrast allows you to apply contrast coefficients to both of the terms in a two-way interaction.

For example, with respect to collcat say that we wish to compare groups 2 and 3, and with respect to mealcat we wish to compare groups 1 and 2. The table of this looks like this below.

  -1 1 0
Collcat low Collcat Med Collcat High
0 Mealcat Low      
-1 Mealcat Med      
1 Mealcat High      

We also would like to form a second interaction contrast that also compares groups 2 and 3 with respect to collcat, and compares groups 2 and 3 on mealcat. A table of this comparison is shown below.

  0 -1 1
Collcat low Collcat Med Collcat High
0 Mealcat Low      
-1 Mealcat Med      
1 Mealcat High      

If we look at the graph of the predicted values (repeated below) we constructed before, it compares the dashed and dotted lines (collcat 2 versus 3) by mealcat 1 versus 2, and then again by mealcat 2 versus 3.

6.5.1 Analyzing interaction contrasts using xi3 and regress

Because we would like to compare groups 1 versus 2, and then groups 2 versus 3 on mealcat, this implies forward difference coding for mealcat (which will compare 1 versus 2, then 2 versus 3). For collcat we wish to compare groups 2 and 3, so we can use helmert coding for that comparison as we did above (since this will compare 1 versus 2 and 3, then 2 versus 3).

xi3: regress api00 h.collcat*a.mealcat

If we are not sure what term we want to use, we can use the describe command to show the labels for the interaction terms.

describe _Ico1Xme* _Ico2Xme* 

The first interaction comparison of interest is tested by _Ico12Xme1 , and this term is significant. As we expect, the red and green lines are not parallel when we compare mealcat 1 and 2.

The second interaction comparison of interest is tested by _Ico2Xme2 , and this term is not significant. Looking at the graph, we can see that the red and green lines are mostly parallel between mealcat 2 and 3.

6.5.2 Coding of interaction contrasts

The term _Ico2Xme1 is just the product of the respective main effects, as shown below.

tablist collcat mealcat  _Icollcat_2 _Imealcat_1 _Ico1Xme1 , sort(v)
    collcat   mealcat   _Icoll~2   _Imealca~1   _Ico2Xme1   Freq  
          1         1          0    .66666667           0     35  
          1         2          0   -.33333333           0     20  
          1         3          0   -.33333333           0     74  
          2         1         .5    .66666667    .3333333     43  
          2         2         .5   -.33333333   -.1666667     43  
          2         3         .5   -.33333333   -.1666667     48  
          3         1        -.5    .66666667   -.3333333     53  
          3         2        -.5   -.33333333    .1666667     69  
          3         3        -.5   -.33333333    .1666667     15  

6.6 Computing adjusted means

6.6.1 Computing adjusted means via anova

First, we show how you can compute adjusted means using the anova command. We use the same model that we have been using, including mealcat, collcat and the interaction of these two variables.

anova api00 collcat mealcat collcat*mealcat emer, contin(emer)

After performing the anova, we can then use the adjust command to get adjusted means broken down by collcat and mealcat. These adjusted means compute the mean that would be expected if every school in the sample were at the mean for the variable emer. Note that it is possible to compute adjusted means with emer at other values besides the mean, for example if we had put emer=50 it would have computed means adjusting each school as though it had a mean of 50.

adjust emer , by(collcat mealcat)

6.6.2 Computing adjusted means via regress

Now we illustrate how to get the same adjusted means if you were to to the analysis via the regress command. First, we perform the regression analysis that is equivalent to the anova command above.

xi3: regress api00 g.collcat*g.mealcat emer

To create the adjusted means we wish to assume that all of the schools are at the average on the variable emer. We do this by assigning the average of emer to the variable emer, but first making a copy of emer as temer so we don't destroy the contents of this variable.

rename emer temer
egen emer = mean(temer)

Now we create yhat as the predicted value. Since the value of emer is set to the mean of emer, this will be the predicted value assuming that all schools are at the average for emer.

predict yhat

Now, we can look at the average of yhat broken down by collcat and mealcat, which you can see corresponds to the adjusted means that we found with the adjust command following the anova command above. 

table collcat mealcat, contents(yhat)

We then drop the variable emer and yhat since we no longer need these variables, and rename temer back to emer so the emer variable is back to the way it was before this process.

drop yhat emer
rename temer emer

6.63 Computing Adjusted means via postgr3

The postgr command can be used to simplify the process of computing adjusted means (i.e. predicted values when holding other variables constant).  Let's assume that you have run the same regression as shown above

. xi3: regress api00 g.collcat*g.mealcat emer 
<output omitted to save space>

You can then show the graph of adjusted means and table of adjusted means using postgr3 as shown below. Below we show just the able of adjusted means, and you can see that they correspond to those computed above.  We should stress that it is important to use the xi3 command (rather than xi) before using postgr3 because then postgr3 knows which variables should be held constant (in this example emer) and which variables should not be held constant (in this example, _Imealcat_2 through  _Ico3Xme3).  

. postgr3 mealcat, by(collcat) connect(solid dash dot) table2
Variables left asis: _Imealcat_2 _Imealcat_3 _Icollcat_2 _Icollcat_3
> _Ico2Xme2 _Ico2Xme3 _Ico3Xme2 _Ico3Xme3
Holding emer constant at 12.6575

----------------------------------------------------------------------
          |           Percentage free meals in 3 categories           
  collcat |   0-46% free meals   47-80% free meals  81-100% free meals
----------+-----------------------------------------------------------
        1 |           797.5604            596.9728            509.8723
        2 |           812.5502             636.405            523.8846
        3 |           767.9352            652.9761            550.4616
----------------------------------------------------------------------

6.7 More details on meaning of coefficients

So far we have discussed a variety of techniques that you can use to help interpret interactions of categorical variables in regression, but we have not gone into great detail about the meaning of the coefficients in these analyses. Let's consider this further. Consider the analysis below using collcat and mealcat, using simple contrasts on both of these variables.

xi3: regress api00 g.collcat*g.mealcat

We can produce the adjusted means as shown below. These will be useful for interpreting the meaning of the coefficients.

predict yhat
table collcat mealcat, contents(mean yhat)

We drop the variable yhat since we no longer need it in case we wish to use this variable name again.

drop yhat 

Let's consider the meaning of the coefficient for _Icollcat_2. The coding for this variable compares group 2 versus group 1; hence, this coefficient corresponds to mean(collcat2) - mean(collcat1). Note that these are the unweighted means, so we compute the mean for collcat2 as the mean of the three cells corresponding to collcat2, i.e., (825.651+636.605+508.833)/3 . If we compare the result below to the coefficient for _Icollcat_2 we see that they are the same.

display (825.651+636.605+508.833)/3 - (816.914+589.35+493.919)/3

Likewise, the coefficient for _Icollcat_3 is mean(collcat3) - mean(collcat1), computed below. The value below corresponds to the coefficient for _Icollcat_3.

display (782.151+655.638+541.733)/3 - (816.914+589.35+493.919)/3

Likewise, the coefficient for _Imealcat_2 works out to be mean(mealcat2) - mean(mealcat1), see below.

display (589.35+636.605+655.638)/3 - (816.914+825.651+782.151)/3

And the coefficient for _Imealcat_3 is mean(mealcat3) - mean(mealcat1), see below.

display (493.919+508.833+541.733)/3 - (816.914+825.651+782.151)/3

To get the meaning of the coefficients for the interaction terms, we need to multiply the contrast coding of the main effects that created the interaction terms. For example, the term _Ico2Xme2 is the product of _Icollcat_2 and _Imealcat_2. We can form a 3 by 3 table showing the coding for _Icollcat_2 on the left, and _Imealcat_2 along the top, and then multiply these terms together and place the products in the cells of the table, see below

  -1 1 0
Collcat low Collcat Med Collcat High
-1 Mealcat Low 1 -1 0
1 Mealcat Med -1 1 0
0 Mealcat High 0 0 0

We then can multiply these terms in the cells by the means of the cells and we get the value for the coefficient for _Ico2Xme2. In other words, we see that this coefficient corresponds to the means of cells (1,2) and (2,1) minus cells (1,1) and (2,2).

display ( 816.914 - 589.35 -  825.651 +  636.605 )

We can go through the same process to verify the meaning of the coefficients for the other three interaction terms. We verify that _Ico2Xme3 is 6.177.

display ( 816.914 - 493.919 -  825.651 + 508.833)

We also verify that _Ico3Xme2 is 101.051.

display ( 816.914 - 589.35 -  782.151 +  655.638 )

And we verify that _Ico3Xme3 is 82.577.

display ( 816.914 - 493.919 -  782.151 + 541.733 )

6.8 Simple effects via dummy coding versus effect coding

You may wonder why we have gone to the effort of using xi3 for creating and testing these effects instead of just using dummy coding like we would get with the xi command. Let's compare how to get simple effects using the xi3 command via effect coding to how we would get simple effects using xi with dummy coding. We hope to show that it is much easier to use effect coding via xi3 and that the interpretation of the coefficients is much more intuitive.

6.8.1 Example 1. Simple effects of yr_rnd at levels of mealcat

Let's use an example from Chapter 3 (section 3.5). In that example we looked at an analysis using mealcat and yr_rnd and the interaction of these two variables. First, we look at how to do a simple effects analysis looking at the simple effects of yr_rnd at each level of mealcat using the xi3 command with effect coding. To make our results correspond to those from Chapter 3, we will make group 3 of mealcat the reference category.

char mealcat[omit] 3
xi3 : regress api00 g.yr_rnd@g.mealcat

Now we can obtain the simple effect of yr_rnd at mealcat=1 by inspecting the coefficient for _Iyr1Wme1, the simple effect of yr_rnd at mealcat=2 by inspecting the coefficient for _Iyr1Wme2 and the simple effect of yr_rnd at mealcat=3 by inspecting the coefficient for _Iyr1Wme3.

Now let's perform the same analysis using xi with dummy coding. Again, we will explicitly make the third group for mealcat to be the omitted category.

char mealcat[omit] 3
xi : regress api00 i.mealcat*yr_rnd

In order to form a test of simple main effects we need to make a table like the one shown below that relates the means of the cells to the coefficients in the regression. Please see Chapter 3, section 3.5 for information on how this table was constructed.

            mealcat=1           mealcat=2         mealcat=3
            -------------------------------------------------
  yr_rnd=0  _cons               _cons             _cons    
            +BImealcat1         +BImealcat2 
            -------------------------------------------------
  yr_rnd=1  _cons               _cons             _cons    
            +Byr_rnd            +Byr_rnd          +Byr_rnd
            +BImealcat1         +BImealcat2           
            +B_ImeaXyr_rn_1     +B_ImeaXyr_rn_2 

Let's start by looking at how to get the simple effect of yr_rnd when mealcat is 3. Looking at the table above, we can see that we would want to compare _cons with _cons + Byr_rnd. We can do this with the lincom command as shown below.

lincom _cons - (_cons + yr_rnd)

We see that _cons drops out, yielding just yr_rnd. Instead, we can use the test command to test whether the coefficient for yr_rnd is 0. Note that this result corresponds to the result we found with the xi3 command also testing the simple effect of yr_rnd when mealcat is 3.

test yr_rnd=0

Note that the coefficient for yr_rnd corresponds to the test of the effect of yr_rnd when all other variables are set to 0 (the reference category), in other words, when mealcat is set to the reference category. You may be tempted to interpret the coefficient for yr_rnd as the overall difference between year round schools and non-year round schools, but in this example we see that it really corresponds to the simple effect of yr_rnd. When using dummy coding people commonly misinterpret the lower order effects to refer to overall effects rather than simple effects.

Now let's look at the simple effect of yr_rnd when mealcat=1. Looking at the table above we see that this involves the comparison of the coefficients for yr_rnd=1 versus yr_rnd=0 when mealcat=1, i.e., comparing _cons + yr_rnd + _Imealcat_1 + _ImeaXyr_rn_1 versus _cons + _Imealcat_1. Removing the terms that drop out we can do the test command below.

test yr_rnd + _ImeaXyr_rn_1=0

We can likewise obtain the effect of yr_rnd when mealcat is 2, as shown below.

test yr_rnd + _ImeaXyr_rn_2=0

These examples illustrate that it is more complicated to form simple effects when using dummy coding, and also that the interpretation of lower order effects when using dummy coding may not have the meaning that you would expect.

6.8.2 Example 2. Simple effects of mealcat at levels of yr_rnd

Example 1 looked at simple effects for yr_rnd, a variable with only two levels In this example, let's consider the simple effects of mealcat at each level of yr_rnd. Because mealcat has more than two levels, we can see what is required for doing tests of simple effects for variables with more than two levels.

First, let's show how to get these simple effects using the xi3 command using effect coding.

xi3 : regress api00 g.mealcat@g.yr_rnd

We can get the simple effect of mealcat at yr_rnd = 0 just as we did earlier in this chapter.

test _Ime1Wyr0 _Ime2Wyr0

And we likewise get the simple effect of mealcat at yr_rnd = 1 as shown below.

test _Ime1Wyr1 _Ime2Wyr1

We can now test the simple effects of mealcat at each level of yr_rnd via dummy coding.

xi : regress api00 i.mealcat*yr_rnd

The simple effect of mealcat when yr_rnd is 0 requires two test statements since it is a 2 degree of freedom test. We can do this by testing mean(mealcat1) = mean(mealcat2) and also testing mean(mealcat2) = mean(mealcat3). We can look at the table above and see that mean(mealcat1) = mean(mealcat2) is _Imealcat_1- _Imealcat_2 (after _cons drops out) and mean(mealcat2) = mean(mealcat3) is _Imealcat_2 after _cons drops out. So, we can perform this test using the two test commands below.

test  _Imealcat_1- _Imealcat_2=0
test  _Imealcat_2, accum

Note that the effects _Imealcat_1 and _Imealcat_2 do not correspond to overall effects of the variable mealcat but are the simple effects when yr_rnd is set to 0, the reference level. Again we see that the terms that we might be tempted to call main effects and think of as overall effects really are simple effects when dummy coding is used.

The second test command uses the accum option to accumulate the tests to get the 2 degree of freedom test that corresponds to the simple effect of mealcat when yr_rnd is 0.

Likewise, we can look at the table above to form the comparisons needed to obtain the simple effects of mealcat when yr_rnd is 1.

test _Imealcat_1+ _ImeaXyr_rn_1- _Imealcat_2- _ImeaXyr_rn_2=0
test  _Imealcat_2+ _ImeaXyr_rn_2=0, accum

Using this example we hoped to illustrate that when performing simple effects for a variable with more than two levels can be quite tricky and requires constructing multiple test commands, one test command for every degree of freedom in the simple effect. As you can see, constructing these terms can be very tricky and possibly error prone. Without a method for double checking results, it is very possible to make a mistake when constructing terms and form the wrong comparison. By comparison, using effect coding with xi3, forming comparisons can be much easier and the interpretation of the lower order effects is much more intuitive. The lower order effects do correspond to the overall effects of the variable, for example the effect of yr_rnd, when using effect coding, does correspond to the overall unweighted mean for the year round schools compared to the non-year round schools.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California