R FAQ
How can I test contrasts in R?

After fitting a model with categorical predictors, especially interacted categorical predictors, one may wish to compare different levels of the variables than those presented in the table of coefficients. We can start with a simple linear model with a continuous predictor and two interacted categorical predictors.


library(multcomp)
hsb2 <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv")

m1 <- lm(read ~ socst + factor(ses) * factor(female), data = hsb2)
summary(m1)

Call:
lm(formula = read ~ socst + factor(ses) * factor(female), data = hsb2)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.8443  -5.5811   0.2377   4.7538  18.4295 

Coefficients:
                             Estimate Std. Error t value Pr(>|t|)    
(Intercept)                  23.91791    3.18443   7.511 2.13e-12 ***
socst                         0.58651    0.05626  10.425  < 2e-16 ***
factor(ses)2                 -1.53491    2.39003  -0.642   0.5215    
factor(ses)3                 -2.12448    2.65133  -0.801   0.4240    
factor(female)1              -4.98561    2.50454  -1.991   0.0479 *  
factor(ses)2:factor(female)1  2.37096    2.97517   0.797   0.4265    
factor(ses)3:factor(female)1  7.37483    3.26625   2.258   0.0251 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 7.934 on 193 degrees of freedom
Multiple R-squared: 0.4192,	Adjusted R-squared: 0.4011 
F-statistic: 23.22 on 6 and 193 DF,  p-value: < 2.2e-16 

The coefficients listed above provide contrasts between the indicated level and the omitted reference level. For example, we can see that, if female = 0, the difference between ses = 1 (the reference level) and ses = 2 is -1.53491 (holding socst constant). For this value, the model output includes a test of the null hypothesis that it is equal to zero. But we may be interested in comparing other combinations of ses and female. With a bit of arithmetic, we can arrive at the difference between two levels. But what if we want to test this difference?

We can do so by defining a contrast of interest and testing it with the glht (generalized linear hypothesis test) command in the multcomp package. To define the contrast, we can look at the order in which the coefficients are presented in the output, then create a vector the length of the coefficient list (including the intercept). To start, we can compare levels 2 and 3 of ses for female = 0. Thus, we want to test the difference in the third and fourth coefficients in our output. After we create our contrast vector, we pass it along with the model object to glht. Then, to see the result, we look at a summary of our glht object.


# difference between ses = 2 and ses =3 when female = 0
K <- matrix(c(0, 0, 1, -1, 0, 0, 0), 1)
t <- glht(m1, linfct = K)
summary(t)

	 Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = read ~ socst + factor(ses) * factor(female), data = hsb2)

Linear Hypotheses:
       Estimate Std. Error t value Pr(>|t|)
1 == 0   0.5896     1.9146   0.308    0.758
(Adjusted p values reported -- single-step method)

The estimate we see in this output is consistent with the model output:


m1$coef[4] - m1$coef[3]
factor(ses)3 
  -0.5895688 

We can look at a slightly more complicated contrast, comparing levels 2 and 3 of ses for female = 1:


# difference between ses = 2 and ses =3 when female = 1
K <- matrix(c(0, 0, 1, -1, 0, 1, -1), 1)
t <- glht(m1, linfct = K)
summary(t)

	 Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = read ~ socst + factor(ses) * factor(female), data = hsb2)

Linear Hypotheses:
       Estimate Std. Error t value Pr(>|t|)  
1 == 0   -4.414      1.875  -2.354   0.0196 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
(Adjusted p values reported -- single-step method)

To test "differences of differences"--how the difference between ses = 2 and ses = 3 for female = 0 vs. female = 1-- we can define our contrast as the difference in the vectors we defined above and test this using glht:


# looking at the difference of differences
# ses = 2 vs. 3 for female = 0
K1 <- matrix(c(0, 0, 1, -1, 0, 0, 0), 1)
# ses = 2 vs. 3 for female = 1
K2 <- matrix(c(0, 0, 1, -1, 0, 1, -1), 1)
# difference of differences
(K <- K2 - K1)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    0    0    0    0    -1   1

t <- glht(m1, linfct = K)
summary(t)


	 Simultaneous Tests for General Linear Hypotheses

Fit: lm(formula = read ~ socst + factor(ses) * factor(female), data = hsb2)

Linear Hypotheses:
       Estimate Std. Error t value Pr(>|t|)  
1 == 0   -5.004      2.653  -1.886   0.0608 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
(Adjusted p values reported -- single-step method)

We can also test all possible pairwise combinations. To make this easier, we will first create an "interaction" variable (using the function, interaction) whose levels uniquely identify all the levels from ses and female. All pairwise comparisons can then be calculated automatically.

# all pairwise comparsions
# creating a BIG group variable
hsb2$tall <- with(hsb2, interaction(female, ses, sep = "x"))
m2 <- lm(read ~ socst + tall, data = hsb2)
l2 <- glht(m2, linfct = mcp(tall = "Tukey"))
summary(l2)

	 Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: lm(formula = read ~ socst + tall, data = hsb2)

Linear Hypotheses:
               Estimate Std. Error t value Pr(>|t|)
1x1 - 0x1 == 0  -4.9856     2.5045  -1.991    0.344
0x2 - 0x1 == 0  -1.5349     2.3900  -0.642    0.987
1x2 - 0x1 == 0  -4.1496     2.4124  -1.720    0.513
0x3 - 0x1 == 0  -2.1245     2.6513  -0.801    0.966
1x3 - 0x1 == 0   0.2647     2.6297   0.101    1.000
0x2 - 1x1 == 0   3.4507     1.8207   1.895    0.401
1x2 - 1x1 == 0   0.8361     1.8251   0.458    0.997
0x3 - 1x1 == 0   2.8611     2.0911   1.368    0.740
1x3 - 1x1 == 0   5.2504     2.0752   2.530    0.117
1x2 - 0x2 == 0  -2.6146     1.6341  -1.600    0.592
0x3 - 0x2 == 0  -0.5896     1.9146  -0.308    1.000
1x3 - 0x2 == 0   1.7997     1.9008   0.947    0.932
0x3 - 1x2 == 0   2.0251     1.8835   1.075    0.888
1x3 - 1x2 == 0   4.4143     1.8750   2.354    0.173
1x3 - 0x3 == 0   2.3892     2.0850   1.146    0.858
(Adjusted p values reported -- single-step method)

A few notes

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.