Help the Stat Consulting Group by giving a gift

Chapter 6 - More on Interactions of Categorical Predictors

6.1. Analysis with two categorical variables

6.2. Simple effects

6.2.1 Analyzing simple effects using PROC GLM

6.2.2 Analyzing Simple Effects Using PROC REG

6.3. Simple comparisons

6.3.1 Analyzing simple comparisons using PROC REG

6.3.2 Analyzing simple comparisons using PROC GLM

6.4. Partial Interaction

6.4.1 Analyzing partial interactions using PROC GLM

6.4.2 Analyzing partial interactions using PROC REG

6.5. Interaction contrasts

6.5.1 Analyzing interaction contrasts using PROC GLM

6.5.2 Analyzing interaction contrasts using PROC REG

6.6. Computing adjusted means

6.6.1 Computing adjusted means via PROC GLM

6.6.1 Computing adjusted means via PROC REG

6.7. More details on meaning of coefficients

6.8. Simple effects via dummy coding versus effect coding

6.8.1 Example 1. Simple effects of yr_rnd at levels of mealcat

6.8.2 Example 2. Simple effects of mealcat at levels of yr_rnd

**6****.0 Introduction**

This chapter will use the **elemapi2** data that you have seen in the prior
chapters. We assume that you have put the data files in "c:\sasreg\"
directory.

data elemapi2; set 'c:\sasreg\elemapi2'; run;

For this chapter we will use the **elemapi2** data file that we have been using in prior chapters. We will focus on the variables **mealcat**, and **collcat** as they relate to the outcome variable **api00**
(performance on the api in the year 2000. The variable **mealcat** is the variable **meals** broken up into
three categories, and the variable **collcat** is the variable **some_col** broken into 3 categories. We could think of **mealcat** as being the number of students receiving free meals and broken up into **low**, **middle** and **high**. The variable **collcat** can be thought of as the number of parents with some college education, and we could think of it as being broken up into **low**, **medium** and **high**. For our analysis, we think that both **mealcat** and **collcat** may be related to **api00**, but it is also possible that the impact of **mealcat** might depend on the level of **collcat**. In other words, we think that there might be an interaction of these two
categorical variables. In this chapter we will look at how these two categorical variables are related to api performance in the school, and we will look at the interaction of these two categorical variables as well. We will see that there is an interaction of these categorical variables, and will focus on different ways of further exploring the interaction. Let's have a quick look at these variables.

proc tabulate data=elemapi2; class collcat mealcat ; var api00; table mealcat='mealcat', mean=' '*api00='API Index for 2000'*collcat='collcat'*F=10.2 / RTS=13.; run;---------------------------------------------- | | API Index for 2000 | | |--------------------------------| | | collcat | | |--------------------------------| | | 1 | 2 | 3 | |-----------+----------+----------+----------| |mealcat | | | | |-----------| | | | |1 | 816.91| 825.65| 782.15| |-----------+----------+----------+----------| |2 | 589.35| 636.60| 655.64| |-----------+----------+----------+----------| |3 | 493.92| 508.83| 541.73| ----------------------------------------------

** 6.1. Analysis with two categorical variables**

One traditional way to analyze this would be to perform a 3 by 3 factorial
analysis of variance using **proc glm**, as shown below. The results show a
main effect of **collcat** (F=4.5, p-0.0117), a main effect of **mealcat**
(F=509.04, p=0.0000) and an interaction of collcat by mealcat, (F=6.63,
p=0.0000). We also use **lsmeans** and **output** statement to output
the predicted means for each group and get ourselve ready to graph the cell
means.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat | mealcat /ss3; lsmeans collcat*mealcat; output out = pred p = pred; run; quit;

The GLM Procedure Class Level Information Class Levels Values collcat 3 1 2 3 mealcat 3 1 2 3 Number of observations 400 The GLM Procedure Dependent Variable: api00 api 2000 Sum of Source DF Squares Mean Square F Value Pr > F Model 8 6243714.810 780464.351 166.76 <.0001 Error 391 1829957.187 4680.197 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE api00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F collcat 2 42140.566 21070.283 4.50 0.0117 mealcat 2 4764843.563 2382421.781 509.04 <.0001 collcat*mealcat 4 124167.809 31041.952 6.63 <.0001 Least Squares Means collcat mealcat api00 LSMEAN 1 1 816.914286 1 2 589.350000 1 3 493.918919 2 1 825.651163 2 2 636.604651 2 3 508.833333 3 1 782.150943 3 2 655.637681 3 3 541.733333

We can now create the graph of cell means of **api00** using the dataset **pred**.

proc sort data = pred; by mealcat; run;symbol1 v=circle i=join ci=blue h= 2; symbol2 v=triangle i=join ci=red h =2; symbol3 v=square i=join ci=black h =2;proc gplot data = pred; plot pred*mealcat=collcat ; run; quit;

We can do
the same analysis using the regression approach via **proc reg**. We use
simple regression coding for both **collcat** and **mealcat**. We also
create interaction terms for them. The first **test** statement tests the
effect of main effect of **collcat**, the second the main effect of **
mealcat** and the last one on the effect of overall interaction.

data reg1; set elemapi2; s2 = -1/3; s3=-1/3; if collcat = 2 then s2 = 2/3; if collcat = 3 then s3 = 2/3; m2 = -1/3; m3 = -1/3; if mealcat = 2 then m2 = 2/3; if mealcat = 3 then m3 = 2/3; sm22 = s2*m2; sm23 = s2*m3; sm32 = s3*m2; sm33 = s3*m3; run; proc reg data = reg1; model api00 = s2 s3 m2 m3 sm22 sm23 sm32 sm33; Collcat: test s2=s3=0; Mealcat: test m2=m3=0; Interaction: test sm22=sm23=sm32=sm33=0; output out = pred2 p = pred; run; quit;

The REG Procedure Model: MODEL1 Dependent Variable: api00 api 2000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 6243715 780464 166.76 <.0001 Error 391 1829957 4680.19741 Corrected Total 399 8073672 Root MSE 68.41197 R-Square 0.7733 Dependent Mean 647.62250 Adj R-Sq 0.7687 Coeff Var 10.56356 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 650.08826 3.87189 167.90 <.0001 s2 1 23.63531 9.10533 2.60 0.0098 s3 1 26.44625 9.99513 2.65 0.0085 m2 1 -181.04135 9.07713 -19.94 <.0001 m3 1 -293.41027 9.44946 -31.05 <.0001 sm22 1 38.51777 24.19532 1.59 0.1122 sm23 1 6.17754 20.08262 0.31 0.7585 sm32 1 101.05102 22.88808 4.42 <.0001 sm33 1 82.57776 24.43941 3.38 0.0008 Test Collcat Results for Dependent Variable API00 Mean Source DF Square F Value Pr > F Numerator 2 21070 4.50 0.0117 Denominator 391 4680.19741 Test Mealcat Results for Dependent Variable API00 Mean Source DF Square F Value Pr > F Numerator 2 2382422 509.04 <.0001 Denominator 391 4680.19741 Test Interaction Results for Dependent Variable API00 Mean Source DF Square F Value Pr > F Numerator 4 31042 6.63 <.0001 Denominator 391 4680.19741

First, note that the results of the test statements correspond to those from
**proc glm** statement. This is because **collcat** and **mealcat**
were coded using simple effect coding, a coding scheme where the contrasts sum
to 0. If this had been coded using dummy coding, then the results of the
test commands for **mealcat** and **collcat** from the **proc reg**
would not have corresponded to the **proc glm** results. In addition to
simple coding, we could have used deviation or helmert coding schemes and the
results of the test commands would have matched the result from **proc glm**,
although the meaning of the individual tests would have been different. This
point will be explored in more detail later in this chapter.

The graph of the cell means we obtained before illustrates the interaction
between **collcat** and **mealcat**. The graph shows the 3 levels of **
collcat** as 3 different lines, and the 3 levels of **mealcat** as the 3
values on the x axis of the graph. We can see that the effect of **collcat**
differs based on the level of **mealcat**. For example, when **mealcat**
is low, schools where **collcat** is 3 have the lowest **api00** scores,
as compared to schools that are medium or high on **mealcat**, where schools
with **collcat** of 3 have the highest **api00** scores.

Let's investigate this interaction further by looking at the simple effects
of **collcat** at each level of **mealcat**.

**6.2. Simple effects**

** 6.2.1 Analyzing simple effects using PROC GLM**

This analysis looks at the simple effects of **collcat** at the different
levels of **mealcat** using **proc glm**. The **lsmeans** statement
with option **slice = mealcat** gives the test of effects of **collcat**
at each level of **mealcat**.** **

proc glm data= elemapi2; class collcat mealcat; model api00 = mealcat|collcat ; lsmeans mealcat*collcat / slice = mealcat ; run; quit;

The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr > F Model 8 6243714.810 780464.351 166.76 <.0001 Error 391 1829957.187 4680.197 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE API00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F MEALCAT 2 4764843.563 2382421.781 509.04 <.0001 COLLCAT 2 42140.566 21070.283 4.50 0.0117 COLLCAT*MEALCAT 4 124167.809 31041.952 6.63 <.0001

COLLCAT MEALCAT API00 LSMEAN 1 1 816.914286 1 2 589.350000 1 3 493.918919 2 1 825.651163 2 2 636.604651 2 3 508.833333 3 1 782.150943 3 2 655.637681 3 3 541.733333 COLLCAT*MEALCAT Effect Sliced by MEALCAT for API00 Sum of MEALCAT DF Squares Mean Square F Value Pr > F 1 2 50909 25455 5.44 0.0047 2 2 68629 34314 7.33 0.0007 3 2 29979 14990 3.20 0.0417

**6.2.2 Analyzing Simple Effects Using PROC REG**

We have
demonstrated how to test the simple effect of **collcat** at each level of **
mealcat** using **PROC GLM** in the previous section. That is through the approach of ANOVA. We can also obtain
the same analysis through regression approach. After all, Anova is regression.
In regression approach, we will create the coding for variable **collcat**, **mealcat**
and their interaction. The coding scheme is specific for the effect we want to
see. For example, in this section, we will do an analysis parallel to the
previous section. That is to say that we want to see the simple effect of **collcat**
at each level of **mealcat**. We will use simple coding for **mealcat**, though in our case the type of coding for **mealcat
**does not really matter. The scheme for simple coding is shown
chapter
5. The
reference group for **mealcat** is group 1. We use **helmert** coding for **collcat**. We should note that these terms are not used in the analysis, but are used
for creating the simple effects of **collcat** at each level of **mealcat**.

data reg2; set elemapi2; mcat1 = 1/3; mcat2 = 1/3; if mealcat = 3 then mcat1 = -2/3; if mealcat = 2 then mcat2 = -2/3; ccat1 = -1/3; if collcat = 1 then do; ccat1 = 2/3; ccat2 = 0; end; if collcat = 2 then ccat2 = .5; if collcat = 3 then ccat2 = -.5; c1m1 = 0; c2m1 = 0; c1m2 = 0; c2m2 = 0; c1m3 = 0; c2m3 = 0;if ( mealcat = 1) then do; c1m1 = ccat1; c2m1 = ccat2; end;if ( mealcat = 2) then do; c1m2 = ccat1; c2m2 = ccat2; end; if ( mealcat = 3) then do; c1m3 = ccat1; c2m3 = ccat2; end; run;

Now, that we have seen the **helmert** coding for **collcat**, we can see how this is used to create the simple effects of
**collcat** at each level of **mealcat**. First, we look at the two comparisons of **collcat** at **mealcat** of 1. Note that the coding is the same as we saw above, but only when **mealcat** is 1, otherwise these variables are coded 0.
Likewise, we look at the terms that form the effects of **collcat** when **mealcat** is 2, and we see that the variables are coded the same way when **mealcat** is 2, and otherwise 0.
The same is true for the case when **mealcat** is 3. The following
matrix is the coding we just used for all the interaction terms.

collcat | mealcat | c1m1 | c2m1 | c1m2 | c2m2 | c1m3 | c2m3 |

1 | 1 | 2/3 | 0 | 0 | 0 | 0 | 0 |

2 | 1 | -1/3 | 1/2 | 0 | 0 | 0 | 0 |

3 | 1 | -1/3 | -1/2 | 0 | 0 | 0 | 0 |

1 | 2 | 0 | 0 | 2/3 | 0 | 0 | 0 |

2 | 2 | 0 | 0 | -1/3 | 1/2 | 0 | 0 |

3 | 2 | 0 | 0 | -1/3 | -1/2 | 0 | 0 |

1 | 3 | 0 | 0 | 0 | 0 | 2/3 | 0 |

2 | 3 | 0 | 0 | 0 | 0 | -1/3 | 1/2 |

3 | 3 | 0 | 0 | 0 | 0 | -1/3 | -1/2 |

Now we are ready for our regression analysis. The test statements used below
are for testing the simple effect of **collcat** ** **at each level
of **mealcat**.

proc reg data = reg2; model api00 = mcat1 mcat2 c1m1 c2m1 c1m2 c2m2 c1m3 c2m3; mealcat1: test c1m1 = c2m1 = 0; mealcat2: test c1m2 = c2m2 = 0; mealcat3: test c1m3 = c2m3 = 0; run; quit;

The REG Procedure Model: MODEL1 Dependent Variable: API00 api 2000

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 6243715 780464 166.76 <.0001 Error 391 1829957 4680.19741 Corrected Total 399 8073672 Root MSE 68.41197 R-Square 0.7733 Dependent Mean 647.62250 Adj R-Sq 0.7687 Coeff Var 10.56356 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 650.08826 3.87189 167.90 <.0001 MCAT1 1 293.41027 9.44946 31.05 <.0001 MCAT2 1 181.04135 9.07713 19.94 <.0001 C1M1 1 13.01323 13.52800 0.96 0.3367 C2M1 1 43.50022 14.04092 3.10 0.0021 C1M2 1 -56.77117 16.67866 -3.40 0.0007 C2M2 1 -19.03303 13.29175 -1.43 0.1530 C1M3 1 -31.36441 12.86955 -2.44 0.0153 C2M3 1 -32.90000 20.23653 -1.63 0.1048

Test mealcat1 Results for Dependent Variable API00 Mean Source DF Square F Value Pr > F Numerator 2 25455 5.44 0.0047 Denominator 391 4680.19741 Test mealcat2 Results for Dependent Variable API00 Mean Source DF Square F Value Pr > F Numerator 2 34314 7.33 0.0007 Denominator 391 4680.19741 Test mealcat3 Results for Dependent Variable API00 Mean Source DF Square F Value Pr > F Numerator 2 14990 3.20 0.0417 Denominator 391 4680.19741

**6.3 Simple Comparisons**

In the analyses above we looked at the simple effect of **collcat** at each level of **mealcat**. For example, we looked at the overall effect of **collcat** when **mealcat** was 1. This is the simple effect of **collcat** at **mealcat**=1. Because **collcat** has more than 2 levels, we may wish to make further comparisons among the 3 levels of **collcat** within **mealcat**=1. Simple comparisons allow us to make such comparisons.

**6.3.1 Analyzing Simple Comparisons Using PROC REG**

In the previous regression analysis, we used helmert coding for **collcat**.
We choose this coding scheme so we could compare group 1 with groups 2 and 3 and then compare groups 2 and
3 within** mealcat **= 1. For example, if we wanted to compare ** collcat** 1
vs. 2 and 3, we would want to look at the effect **c1m1**, and if we wanted to compare **collcat** groups 2 and 3 when **mealcat** is 1, then we would look at the effect
**c2m1**. For example, **c1m1 ** is not significant
with t-value = 0.96 and p-value = 0.3367. That is to say that the
difference between group 1 of **collcat** with group 2 and group 3 with **mealcat**
= 1 is not significant.

**6.3.2 Analyzing Simple Comparisons Using PROC GLM**

We can also look at the simple comparisons using **PROC GLM**. For
example, for the comparsion of group 1 vs 2+ of **collcat** within **mealcat**
= 1, we can do the following. The estimate statement below indicates that the
comparison on **collcat** is between group 1 and all the upper groups and the
comparison is restricted to within **mealcat** = 1.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat 1 -.5 -.5 collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0; run; quit;

The GLM Procedure Dependent Variable: API00 api 2000 Sum of Source DF Squares Mean Square F Value Pr > F Model 8 6243714.810 780464.351 166.76 <.0001 Error 391 1829957.187 4680.197 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE API00 Mean 0.773343 10.56356 68.41197 647.6225 Source DF Type III SS Mean Square F Value Pr > F COLLCAT 2 42140.566 21070.283 4.50 0.0117 MEALCAT 2 4764843.563 2382421.781 509.04 <.0001 COLLCAT*MEALCAT 4 124167.809 31041.952 6.63 <.0001 Standard Parameter Estimate Error t Value Pr > |t| collcat 1 vs 2+ within mealcat = 1 13.0132326 13.5279998 0.96 0.3367

**6.4 Partial Interaction**

A partial interaction allows you to apply contrasts to one of the effects in an interaction term. For example, we can draw the interaction of **collcat** by **mealcat** like this below.

Collcat low | Collcat Med | Collcat High | |

Mealcat Low | |||

Mealcat Med | |||

Mealcat High |

Say that we wanted to compare, in the context of this interaction, group 1 for **collcat** vs. groups 2 and 3. The table of this partial interaction would look like this.
The contrast coefficients of -2 1 1 applied to **collcat** indicate the
comparison of group 1 for **collcat** vs. groups 2 and 3.

-2 | 1 | 1 | |

Collcat low | Collcat Med | Collcat High | |

Mealcat Low | |||

Mealcat Med | |||

Mealcat High |

Likewise, we also might want to compare groups 2 and 3 of **collcat** by **mealcat**, and the table of this interaction would look like this.

0 | -1 | 1 | |

Collcat low | Collcat Med | Collcat High | |

Mealcat Low | |||

Mealcat Med | |||

Mealcat High |

These are called partial interactions because contrast coefficients are applied to one of the terms involved in the interaction.

**6.4.1 Analyzing partial interactions using PROC GLM**

We wish to compare groups 1 versus 2 on **collcat. **Similarly, we can also
compare groups 2 and 3 on **collcat**.
For example, we want to test the partial interaction of **collcat **comparing
group 1 vs. 2 and 3 by **mealcat**, we can do the following **contrast**
statement. Because **mealcat** has 2 degrees of freedom, the test of partial
interaction also has 2 degrees of freedom. The 2 degrees of freedom of factor **mealcat**
can be broken down into 2 comparisons. These two interaction contrasts are separated
by a semi-colon, which tells SAS to join these contrasts together into a single
test with 2
degrees of freedom.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat; contrast 'test of sm11 and sm12' collcat*mealcat 1 -1 0 -.5 .5 0 -.5 .5 0, collcat*mealcat 0 1 -1 0 -.5 .5 0 -.5 .5; contrast 'test of sm21 and sm22' collcat*mealcat 0 0 0 1 -1 0 -1 1 0, collcat*mealcat 0 0 0 0 1 -1 0 -1 1; run; quit;

The GLM Procedure <output omitted> Contrast DF Contrast SS Mean Square F Value Pr > F test of sm11 and sm12 2 54141.40962 27070.70481 5.78 0.0033 test of sm21 and sm22 2 66511.60133 33255.80067 7.11 0.0009

**6.4.2 Analyzing partial interactions Using **
**PROC REG**

With
regression analysis, we can also compare groups 1 vs. 2 and 3 on **collcat,
**or compare groups 2 and 3 on **collcat**. This implies Helmert coding on **collcat**, as
we did before.

data reg3; set elemapi2; if mealcat = 1 then m1 = 2/3; if mealcat = 2 then m1 = -1/3; if mealcat = 3 then m1 = -1/3; if mealcat = 1 then m2 = 1/3; if mealcat = 2 then m2 = 1/3; if mealcat = 3 then m2 = -2/3; if collcat = 1 then s1 = 2/3; if collcat = 2 then s1 = -1/3; if collcat = 3 then s1 = -1/3; if collcat = 1 then s2 = 0; if collcat = 2 then s2 = 1/2; if collcat = 3 then s2 = -1/2; sm11 = s1*m1; sm12 = s1*m2; sm21 = s2*m1; sm22 = s2*m2; run; proc reg data = reg3; model api00 = s1 s2 m1 m2 sm11 sm12 sm21 sm22; test sm11 = sm12 = 0; test sm21 = sm22 = 0; run; quit;

The REG Procedure Model: MODEL1 Dependent Variable: api00 api 2000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 6243715 780464 166.76 <.0001 Error 391 1829957 4680.19741 Corrected Total 399 8073672 Root MSE 68.41197 R-Square 0.7733 Dependent Mean 647.62250 Adj R-Sq 0.7687 Coeff Var 10.56356 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 650.08826 3.87189 167.90 <.0001 s1 1 -25.04078 8.34539 -3.00 0.0029 s2 1 -2.81094 9.32938 -0.30 0.7633 m1 1 181.04135 9.07713 19.94 <.0001 m2 1 112.36892 9.90759 11.34 <.0001 sm11 1 69.78440 21.47520 3.25 0.0013 sm12 1 -25.40675 21.06663 -1.21 0.2285 sm21 1 62.53325 19.33438 3.23 0.0013 sm22 1 13.86697 24.21132 0.57 0.5671 Test 1 Results for Dependent Variable api00 Mean Source DF Square F Value Pr > F Numerator 2 27071 5.78 0.0033 Denominator 391 4680.19741 Test 2 Results for Dependent Variable api00 Mean Source DF Square F Value Pr > F Numerator 2 33256 7.11 0.0009 Denominator 391 4680.19741

**6.5. Interaction Contrasts**

Above we saw that a partial interaction allows you to apply contrast coefficients to one of the terms in a 2 way interaction. An interaction contrast allows you to apply contrast coefficients to both of the terms in a two way interaction.

For example, with respect to **collcat**,** **
let's say that we wish to compare groups 2 and 3, and with respect to

-1 | 1 | 0 | ||

Collcat low | Collcat Med | Collcat High | ||

0 | Mealcat Low | |||

-1 | Mealcat Med | |||

1 | Mealcat High |

We also would like to form a second interaction contrast that also compares groups 2 and 3 with respect to **collcat**, and compares groups 2 and 3 on **mealcat**. A table of this comparison is shown below.

0 | -1 | 1 | ||

Collcat low | Collcat Med | Collcat High | ||

0 | Mealcat Low | |||

-1 | Mealcat Med | |||

1 | Mealcat High |

If we look at the graph of the predicted values (repeated below) we
constructed before, it compares line 2 and 3 (**collcat** 2 vs. 3) by ** mealcat** 1 vs. 2, and then again by
** mealcat** 2 vs. 3.

**6.5.1 Analyzing Interaction Contrasts Using PROG GLM**

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat; contrast 'collcat 2v3 with mealcat 1v2' collcat*mealcat 0 0 0 1 -1 0 -1 1 0; contrast 'somecat 2v3 with mealcat 2v3' collcat*mealcat 0 0 0 0 1 -1 0 -1 1; run; quit;

The GLM Procedure <output omitted> Contrast DF Contrast SS Mean Square F Value collcat 2v3 with mealcat 1v2 1 48958.23687 48958.23687 10.46 somceat 2v3 with mealcat 2v3 1 1535.28987 1535.28987 0.33 Contrast Pr > F collcat 2v3 with mealcat 1v2 0.0013 somceat 2v3 with mealcat 2v3 0.5671

**6.5.2 Analyzing interaction contrasts using PROC REG**

In regression
analysis, we have seen that difference coding schemes of the variables give us
difference contrasts and comparisons. Because we would like to compare groups 1 vs. 2, and then
groups 2 vs. 3 on **mealcat**, we will use forward difference coding for **mealcat** (which will compare 1
vs. 2, then 2 vs. 3).

data reg4; set elemapi2; if mealcat = 1 then m1 = 2/3; if mealcat = 2 then m1 = -1/3; if mealcat = 3 then m1 = -1/3; if mealcat = 1 then m2 = 1/3; if mealcat = 2 then m2 = 1/3; if mealcat = 3 then m2 = -2/3; if collcat = 1 then s1 = 2/3; if collcat = 2 then s1 = -1/3; if collcat = 3 then s1 = -1/3; if collcat = 1 then s2 = 0; if collcat = 2 then s2 = 1/2; if collcat = 3 then s2 = -1/2; sm11 = s1*m1; sm12 = s1*m2; sm21 = s2*m1; sm22 = s2*m2; run; proc reg data = reg4; model api00 = s1 s2 m1 m2 sm11 sm12 sm21 sm22; run; quit;

The REG Procedure Model: MODEL1 Dependent Variable: api00 api 2000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 6243715 780464 166.76 <.0001 Error 391 1829957 4680.19741 Corrected Total 399 8073672 Root MSE 68.41197 R-Square 0.7733 Dependent Mean 647.62250 Adj R-Sq 0.7687 Coeff Var 10.56356 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 650.08826 3.87189 167.90 <.0001 s1 1 -25.04078 8.34539 -3.00 0.0029 s2 1 -2.81094 9.32938 -0.30 0.7633 m1 1 181.04135 9.07713 19.94 <.0001 m2 1 112.36892 9.90759 11.34 <.0001 sm11 1 69.78440 21.47520 3.25 0.0013 sm12 1 -25.40675 21.06663 -1.21 0.2285 sm21 1 62.53325 19.33438 3.23 0.0013 sm22 1 13.86697 24.21132 0.57 0.5671

**6.6 Computing Adjusted Means**

Our model will be almost the same as before, in addition we include an additional
covariate **emer**. We want to obtain the adjusted means of api00 adjusted
for variable **emer**. These adjusted means compute the mean that would be expected if every school in the sample were at the mean for the variable **emer**.

**6.6.1 Computing Adjusted Means via PROC GLM**

The syntax to get the adjusted means using **proc glm** is as follows. The
default is to adjust at the means and it can be changed by using **at variable
= value** option following the **lsmeans** statement.

proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat emer /ss3; lsmeans collcat*mealcat; run; quit;

The GLM Procedure Sum of Source DF Squares Mean Square F Value Pr > F Model 9 6402428.265 711380.918 166.01 <.0001 Error 390 1671243.733 4285.240 Corrected Total 399 8073671.998 R-Square Coeff Var Root MSE api00 Mean 0.793001 10.10801 65.46175 647.6225 Source DF Type III SS Mean Square F Value Pr > F collcat 2 34730.090 17365.045 4.05 0.0181 mealcat 2 3017331.845 1508665.923 352.06 <.0001 collcat*mealcat 4 96789.116 24197.279 5.65 0.0002 emer 1 158713.455 158713.455 37.04 <.0001

collcat mealcat api00 LSMEAN 1 1 797.560428 1 2 596.972811 1 3 509.872241 2 1 812.550248 2 2 636.404940 2 3 523.884659 3 1 767.935241 3 2 652.976146 3 3 550.461628

**6.6.2 Computing
Adjusted Means via REGRESSION**

Now we illustrate how to get the same adjusted means if you were to to the analysis via the
**proc reg**. First, we need to create all the necessary dummy
variables for the categorical variables. The choice of coding schemes does not
matter for the purpose of obtaining the adjusted means. We choose the same coding
scheme we used before for both **mealcat** and **collcat **below.
After coding our variables properly, we proceed to **proc reg **to generate
the regression equation used later in the **proc score** statement to
generate predicted valued based on the equation. The **proc sql **statement
below simply generates a new variable **meanemer** as the mean of **emer**.

data reg6; set elemapi2; if collcat = 1 then s2 = 2/3; if collcat = 2 then s2 = -1/3; if collcat = 3 then s2 = -1/3; if collcat = 1 then s3 = -1/3; if collcat = 2 then s3 = 2/3; if collcat = 3 then s3 = -1/3; if mealcat = 1 then m2 = 2/3; if mealcat = 2 then m2 = -1/3; if mealcat = 3 then m2 = -1/3; if mealcat = 1 then m3 = -1/3; if mealcat = 2 then m3 = 2/3; if mealcat = 3 then m3 = -1/3; sm22 = s2*m2; sm23 = s2*m3; sm32 = s3*m2; sm33 = s3*m3; run; proc reg data = reg6 outest = pred6 noprint; yhat: model api00 = s2 s3 m2 m3 sm22 sm23 sm32 sm33 emer; run; quit; proc sql; create table xy as select *, mean(emer) as meanemer from reg6; quit;

NOTE: You need to rename **meanemer** to **emer** or else the **
proc score** will not work
The variables listed on the **var** statement in the proc score must be the
same as the IVs in
the regression. If they are not, you get a cryptic message about not
finding a variable ,
even though you can see the variable in the data set.

data xyz; set xy; emer = meanemer; run; proc score data = xyz score = pred6 out = ep type = parms; var s2 s3 m2 m3 sm22 sm23 sm32 sm33 emer; run; proc means data = ep mean; class collcat mealcat; var yhat; run;

The MEANS Procedure Analysis Variable : yhat Percentage free meals in 3 N collcat categories Obs Mean ------------------------------------------- 1 1 35 797.5629402 2 20 596.9753239 3 74 509.8747538 2 1 43 812.5527606 2 43 636.4074521 3 48 523.8871715 3 1 53 767.9377531 2 69 652.9786583 3 15 550.4641407 -------------------------------------------

**6.7 More Details on Meaning of the Coefficients**

So far we have discussed a variety of techniques that you can use to help interpret interactions of categorical variables in regression, but we have not gone into
a great detail about the meaning of the coefficients in these analyses. Let's consider this further. Consider the analysis below using **collcat** and **mealcat**, using simple contrasts on both of these variables.
The reference group for both variables will be group 1.

data reg7; set elemapi2; if collcat = 1 then s1 = -1/3; if collcat = 2 then s1 = 2/3; if collcat = 3 then s1 = -1/3; if collcat = 1 then s2 = -1/3; if collcat = 2 then s2 = -1/3; if collcat = 3 then s2 = 2/3; if mealcat = 1 then m1 = -1/3; if mealcat = 2 then m1 = 2/3; if mealcat = 3 then m1 = -1/3; if mealcat = 1 then m2 = -1/3; if mealcat = 2 then m2 = -1/3; if mealcat = 3 then m2 = 2/3; sm11 = s1*m1; sm12 = s1*m2; sm21 = s2*m1; sm22 = s2*m2; run; proc reg data = reg7; model api00 = s1 s2 m1 m2 sm11 sm12 sm21 sm22; output out = predreg7 p = yhat; run; quit;

The REG Procedure Model: MODEL1 Dependent Variable: api00 api 2000 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 6243715 780464 166.76 <.0001 Error 391 1829957 4680.19741 Corrected Total 399 8073672 Root MSE 68.41197 R-Square 0.7733 Dependent Mean 647.62250 Adj R-Sq 0.7687 Coeff Var 10.56356 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 650.08826 3.87189 167.90 <.0001 s1 1 23.63531 9.10533 2.60 0.0098 s2 1 26.44625 9.99513 2.65 0.0085 m1 1 -181.04135 9.07713 -19.94 <.0001 m2 1 -293.41027 9.44946 -31.05 <.0001 sm11 1 38.51777 24.19532 1.59 0.1122 sm12 1 6.17754 20.08262 0.31 0.7585 sm21 1 101.05102 22.88808 4.42 <.0001 sm22 1 82.57776 24.43941 3.38 0.0008

We can produce the adjusted means as shown below. These will be useful for interpreting the meaning of the coefficients.

proc means data = predreg7 mean; class collcat mealcat; var yhat; run;

The MEANS Procedure Analysis Variable : yhat Predicted Value of api00 Percentage free meals in 3 N collcat categories Obs Mean --------------------------------------------------- 1 1 35 816.9142857 2 20 589.3500000 3 74 493.9189189 2 1 43 825.6511628 2 43 636.6046512 3 48 508.8333333 3 1 53 782.1509434 2 69 655.6376812 3 15 541.7333333 ---------------------------------------------------

Let's consider the meaning of the coefficient for **s1**. The coding for this variable compares group 2 vs. group 1, hence this coefficient corresponds to
mean(collcat = 2) - mean(collcat **= **1). Note that these are the unweighted means, so we compute the mean for **collcat
= 2** as the mean of the 3 cells corresponding to **collcat = 2**, i.e. (825.651+636.605+508.833)/3 . If we compare the result below to the coefficient for
**s1** we see that they are the same,

(825.651+636.605+508.833)/3 - (816.914+589.35+493.919)/3

=23.635333.

Likewise, the coefficient for **s2** is mean(collcat = 3) - mean(collcat
= 1), computed below. The value below corresponds to the coefficient for **
s2**.

(782.151+655.638+541.733)/3 - (816.914+589.35+493.919)/3 = 26.446333

Likewise, the coefficient for **m1** works out to be
mean(mealcat = 2) - mean(mealcat = 1), computed below.

(589.35+636.605+655.638)/3 - (816.914+825.651+782.151)/3 = -181.041.

And the coefficient for **m2** is mean(mealcat = 3) -
mean(mealcat = 1), computed below.

(493.919+508.833+541.733)/3 - (816.914+825.651+782.151)/3 = -293.41033

To get the meaning of the coefficients for the interaction terms, let's write out the regression equation and take a closer look at the coefficients. From the parameter estimates, we have the following linear equation for predicted values:

yhat = 650.090 + 23.635*s1 + 26.446*s2 - 181.042*m1 - 293.412*m2 + 38.518*s1*m1 + 6.178*s1*m2 + 101.051*s2*m1 + 82.578*s2*m2.

Because of the simple coding scheme we use for both variables, we have from the above equation,

yhat(

collcat= 2) - yhat(collcat= 1) = 23.635 + 38.518*ms1 + 6.178*ms2.

One way to think about this equation is that for any level of **mealcat**
comparing group 2 vs. group 1 on **collcat** only involves **s1**. It
then follows that the coefficient for **sm11 **is to compare the difference
of group 2 vs. 1 on **collcat **when **mealcat** is 2 with the
difference of group 2 vs. 1 on **collcat** when **mealcat** is 1.
In other words, **sm11** is

[cell(2,2)-cell(1,2)] - [cell(2,1)-cell(1,1)].

Plugging all the corresponding cell means to the above formula, we get

(636.6047 - 589.3500) - (825.6512 - 816.9143) = 38.5175,

which is the
coefficient for **sm11**. Using the same argument, we can have the
following

**sm11 : **[cell(2,2)-cell(1,2)] -
[cell(2,1)-cell(1,1)],

**sm12 : **[cell(2,3)-cell(1,3)] - [cell(2,1)-cell(1,1)],

**sm21 :
**[cell(3,2)-cell(1,2)]
- [cell(3,1)-cell(1,1)],

**sm22 :** [cell(3,3)-cell(1,3)] -
[cell(3,1)-cell(1,1)].

We can go through the same process to verify the meaning of the coefficients for the other 3 interaction terms. We verify that
**sm12** is 6.1775.

(508.8333 - 493.9189) - (825.6512 - 816.9143) = 6.1775.

We also verify that **sm21** is 101.051.

(655.6377 - 589.3500) - (782.1509 - 816.9143) = 101.0511.

Last we verify that **sm22** is 82.5778.

( 541.7333 - 493.9189) - ( 782.1509 - 816.9143) = 82.5778.

**6.8 Simple Effects via Dummy Coding vs. Effect Coding**

We have used in this chapter different types of coding schemes. You may wonder why we have gone to the effort of
creating and testing these effects instead of just using dummy coding and what
is the difference between different coding schemes and how to choose them. In
this section, let's compare how to get **simple effects** using the
effect coding to how we would get simple effects using dummy coding. We hope to show that it is much easier to use effect coding
so that the interpretation of the coefficients is much more intuitive.

**6.8.1 Example 1. Simple effects of yr_rnd at levels of mealcat**

Let's use an example from
Chapter 3
(section 3.5). In that example we looked at and analysis using **mealcat** and **yr_rnd** and the interaction of these two variables. First, we look at how to do a simple effects analysis looking at the simple effects of **yr_rnd** at each level of **mealcat** using
effect coding. To make our results correspond to those from Chapter 3, we will make category 3 of **mealcat** the reference category.

data reg8; set elemapi2; if mealcat = 1 then do; ms1 =2/3; ms2 = -1/3; end; if mealcat = 2 then do; ms1 =-1/3; ms2= 2/3; end; if mealcat = 3 then do; ms1 =-1/3; ms2 = -1/3; end; if yr_rnd = 0 then yr1 = -1/2; else yr1 = 1/2; ym1 = 0; ym2 = 0; ym3 = 0; if mealcat = 1 then ym1 = yr1; if mealcat = 2 then ym2 = yr1; if mealcat = 3 then ym3 = yr1; run; proc reg data = reg8; model api00 = ms1 ms2 ym1 ym2 ym3; run; quit;

The REG Procedure Model: MODEL1 Dependent Variable: API00 api 2000

Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 6204728 1240946 261.61 <.0001 Error 394 1868944 4743.51314 Corrected Total 399 8073672 Root MSE 68.87317 R-Square 0.7685 Dependent Mean 647.62250 Adj R-Sq 0.7656 Coeff Var 10.63477 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 632.23557 5.80048 109.00 <.0001 MS1 1 267.81076 14.61559 18.32 <.0001 MS2 1 114.65715 11.12812 10.30 <.0001 ym1 1 -74.25691 26.75629 -2.78 0.0058 ym2 1 -51.74017 18.88854 -2.74 0.0064 ym3 1 -33.49254 11.77129 -2.85 0.0047Now we can obtain the simple effect of

Now let's perform the same analysis using dummy coding. Again, we will explicitly make the 3rd category for **mealcat** to be the omitted category.

data reg9; set elemapi2; if mealcat = 1 then do; md1 = 1; md2 = 0; end; if mealcat = 2 then do; md1 = 0; md2 = 1; end; if mealcat = 3 then do; md1 = 0; md2 = 0; end; ymd1 = yr_rnd*md1; ymd2 = yr_rnd*md2; run;proc reg data = reg9; model api00 = yr_rnd md1 md2 ymd1 ymd2; run;

The REG Procedure Model: MODEL1 Dependent Variable: API00 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 521.49254 8.41420 61.98 <.0001 YR_RND 1 -33.49254 11.77129 -2.85 0.0047 MD1 1 288.19295 10.44284 27.60 <.0001 MD2 1 123.78097 10.55185 11.73 <.0001 ymd1 1 -40.76438 29.23118 -1.39 0.1639 ymd2 1 -18.24763 22.25624 -0.82 0.4128

In order to form a test of simple main effects we need to make a table like the one shown below that relates the cell means to the coefficients in the regression. Please see Chapter 3, section 3.5 for information on how this table was constructed.

mealcat=1
mealcat=2 mealcat=3

-------------------------------------------------

yr_rnd=0
const
const
const

+
md1
+ md2

-------------------------------------------------

yr_rnd=1
const
const
const

+ yr_rnd
+ yr_rnd + yr_rnd

+
md1
+ md2

+
ymd1
+ ymd2

Let's start by looking at how to get the simple effect of **yr_rnd**
when **mealcat** is 3. Looking at the table above, we can see that we would
want to compare const with const + **yr_rnd**, , which is the
same as testing the coefficient for **yr_rnd** is zero. This is a single
parameter test and is shown in the output above. The t-value is -2.85 and the
p-value is .0047.

Note that the coefficient for **yr_rnd** corresponds to the test of the
effect of **yr_rnd** when all other variables are set to 0 (the reference
category), i.e. when **mealcat** is set to the reference category. You may
be tempted to interpret the coefficient for **yr_rnd** as the overall
difference between year round schools and non-year round schools, but in this
example we see that it really corresponds to the simple effect of **yr_rnd**.
When using dummy coding people commonly misinterpret the lower order effects
to refer to overall effects rather than simple effects.

Now let's look at the simple effect of **yr_rnd** when **mealcat**=1.
Looking at the table above we see that this involves the comparison of the
coefficients for **yr_rnd**=1 vs. **yr_rnd**=0 when **mealcat**=1,
i.e. comparing const + **yr_rnd** +md1 + ymd1 vs. const + md1. Removing the
terms that drop out we see that to test the simple effect of **yr_rnd**
when **mealcat** = 1 is the same to test yr_rnd + ymd1 = 0. We will have to
do a test statement here following the previous **proc reg**.

These examples illustrate that it is more complicated to form simple effects when using dummy coding, and also that the interpretation of lower order effects when using dummy coding may not have the meaning that you would expect.test yr_rnd + ymd1 = 0; run; quit;Test 1 Results for Dependent Variable API00 Mean Source DF Square F Value Pr > F Numerator 1 36536 7.70 0.0058 Denominator 394 4743.51314

**6.8.2 Example 2. Simple effects of mealcat at levels of yr_rnd**

Example 1 looked at simple effects for **yr_rnd**, a variable with only 2 levels
and it showed how to use the test statement in SAS for it. In this example, let's consider the simple effects of **mealcat** at each level of **yr_rnd**. Because **mealcat** has more than 2 levels, we
will see what is required for doing tests of simple effects for variables with more than 2 levels.
We will show both **proc glm** and **proc reg** approach here.

proc glm data = elemapi2; class yr_rnd mealcat; model api00 = yr_rnd mealcat yr_rnd*mealcat; contrast '1' mealcat 1 0 -1 yr_rnd*mealcat 1 0 -1 0 0 0, mealcat 0 1 -1 yr_rnd*mealcat 0 1 -1 0 0 0; contrast '2' mealcat 1 0 -1 yr_rnd*mealcat 0 0 0 1 0 -1, mealcat 0 1 -1 yr_rnd*mealcat 0 0 0 0 1 -1; run; quit;

The GLM Procedure <output omitted>

Contrast DF Contrast SS Mean Square F Value Pr > F 1 2 3903569.804 1951784.902 411.46 <.0001 2 2 476157.455 238078.727 50.19 <.0001

Here is how to do it with proc reg. The first test statement below looks at **mealcat** at **yr_rnd** = 0
and the second test statement looks at **mealcat** at **yr_rnd** = 1.

data reg10; set elemapi2; if yr_rnd = 0 then yrrnd = -.5; if yr_rnd = 1 then yrrnd = .5; if mealcat = 1 then m1 = 2/3; if mealcat = 2 then m1 = -1/3; if mealcat = 3 then m1 = -1/3; if mealcat = 1 then m2 = -1/3; if mealcat = 2 then m2 = 2/3; if mealcat = 3 then m2 = -1/3; if yr_rnd = 0 then my11 = m1; else my11 = 0; if yr_rnd = 0 then my21 = m2; else my21 = 0; if yr_rnd = 1 then my12 = m1; else my12 = 0; if yr_rnd = 1 then my22 = m2; else my22 = 0; run;

proc reg data = reg10; model api00 = yrrnd my11 my21 my12 my22; test my11 = my21 = 0; test my12 = my22 = 0; run; quit;

The REG Procedure Model: MODEL1 Dependent Variable: api00 api 2000 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 632.23557 5.80048 109.00 <.0001 yrrnd 1 -53.16321 11.60095 -4.58 <.0001 my11 1 288.19295 10.44284 27.60 <.0001 my21 1 123.78097 10.55185 11.73 <.0001 my12 1 247.42857 27.30218 9.06 <.0001 my22 1 105.53333 19.59588 5.39 <.0001 Test 1 Results for Dependent Variable api00 Mean Source DF Square F Value Pr > F Numerator 2 1951785 411.46 <.0001 Denominator 394 4743.51314 Test 2 Results for Dependent Variable api00 Mean Source DF Square F Value Pr > F Numerator 2 238079 50.19 <.0001 Denominator 394 4743.51314

We can also test the simple effects of **mealcat** at each level of **yr_rnd** via dummy coding.
In SAS,
each equal sign in the test statement equals one degree of freedom: because there are two
equals signs in the second test statement, it is a two degree-of-freedom test, which is
meant to do. The same logic holds true for the fourth test statement and this test is the
simple effect of **mealcat** when **yr_rnd**=1.

data reg11; set elemapi2; m1 = 0; if mealcat = 1 then m1 = 1; m2 = 0; if mealcat = 2 then m2 = 1; m1y = m1*yr_rnd; m2y = m2*yr_rnd; run;

proc reg data = reg11; model api00 = m1 m2 yr_rnd m1y m2y; test m1 - m2 = 0; test m1 = m2 = 0; test m1 + m1y - m2 - m2y = 0; test m1 + m1y = m2 + m2y = 0; run; quit;

The REG Procedure Model: MODEL1 Dependent Variable: api00 api 2000 Test 1 Results for Dependent Variable api00 Mean Source DF Square F Value Pr > F Numerator 1 1627262 343.05 <.0001 Denominator 394 4743.51314 Test 2 Results for Dependent Variable api00 Mean Source DF Square F Value Pr > F Numerator 2 1951785 411.46 <.0001 Denominator 394 4743.51314 Test 3 Results for Dependent Variable api00 Mean Source DF Square F Value Pr > F Numerator 1 96095 20.26 <.0001 Denominator 394 4743.51314 Test 4 Results for Dependent Variable api00 Mean Source DF Square F Value Pr > F Numerator 2 238079 50.19 <.0001 Denominator 394 4743.51314

**For more information**

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.