Help the Stat Consulting Group by giving a gift

How can I test contrasts in R?

After fitting a model with categorical predictors, especially interacted categorical predictors, one may wish to compare different levels of the variables than those presented in the table of coefficients. We can start with a simple linear model with a continuous predictor and two interacted categorical predictors.

library(multcomp) hsb2 <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv") m1 <- lm(read ~ socst + factor(ses) * factor(female), data = hsb2) summary(m1)Call: lm(formula = read ~ socst + factor(ses) * factor(female), data = hsb2) Residuals: Min 1Q Median 3Q Max -20.8443 -5.5811 0.2377 4.7538 18.4295 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 23.91791 3.18443 7.511 2.13e-12 *** socst 0.58651 0.05626 10.425 < 2e-16 *** factor(ses)2 -1.53491 2.39003 -0.642 0.5215 factor(ses)3 -2.12448 2.65133 -0.801 0.4240 factor(female)1 -4.98561 2.50454 -1.991 0.0479 * factor(ses)2:factor(female)1 2.37096 2.97517 0.797 0.4265 factor(ses)3:factor(female)1 7.37483 3.26625 2.258 0.0251 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 7.934 on 193 degrees of freedom Multiple R-squared: 0.4192, Adjusted R-squared: 0.4011 F-statistic: 23.22 on 6 and 193 DF, p-value: < 2.2e-16

The coefficients listed above provide contrasts between the indicated level
and the omitted reference level. For example, we can see that, if **female**
= 0, the difference between **ses** = 1 (the reference level) and **ses**
= 2 is -1.53491 (holding **socst** constant). For this value, the model
output includes a test of the null hypothesis that it is equal to zero. But we
may be interested in comparing other combinations of **ses** and **female**.
With a bit of arithmetic, we can arrive at the difference between two levels.
But what if we want to test this difference?

We can do so by defining a contrast of interest and testing it with the **
glht** (generalized linear hypothesis test) command in the **multcomp**
package. To define the contrast, we can look at the order in which the
coefficients are presented in the output, then create a vector the length of the
coefficient list (including the intercept). To start, we can compare levels 2
and 3 of **ses** for **female** = 0. Thus, we want to test the difference
in the third and fourth coefficients in our output. After we create our contrast
vector, we pass it along with the model object to **glht**. Then, to see the
result, we look at a summary of our **glht** object.

# difference between ses = 2 and ses =3 when female = 0 K <- matrix(c(0, 0, 1, -1, 0, 0, 0), 1) t <- glht(m1, linfct = K) summary(t)Simultaneous Tests for General Linear Hypotheses Fit: lm(formula = read ~ socst + factor(ses) * factor(female), data = hsb2) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) 1 == 0 0.5896 1.9146 0.308 0.758 (Adjusted p values reported -- single-step method)

The estimate we see in this output is consistent with the model output:

m1$coef[4] - m1$coef[3]factor(ses)3 -0.5895688

We can look at a slightly more complicated contrast, comparing levels 2 and 3
of **ses** for **female** = 1:

# difference between ses = 2 and ses =3 when female = 1 K <- matrix(c(0, 0, 1, -1, 0, 1, -1), 1) t <- glht(m1, linfct = K) summary(t)Simultaneous Tests for General Linear Hypotheses Fit: lm(formula = read ~ socst + factor(ses) * factor(female), data = hsb2) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) 1 == 0 -4.414 1.875 -2.354 0.0196 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Adjusted p values reported -- single-step method)

To test "differences of differences"--how the difference between **ses** =
2 and **ses** = 3 for **female** = 0 vs. **female** = 1-- we can define
our contrast as the difference in the vectors we defined above and test this
using **glht**:

# looking at the difference of differences # ses = 2 vs. 3 for female = 0 K1 <- matrix(c(0, 0, 1, -1, 0, 0, 0), 1) # ses = 2 vs. 3 for female = 1 K2 <- matrix(c(0, 0, 1, -1, 0, 1, -1), 1) # difference of differences (K <- K2 - K1)[,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 0 0 0 0 0 -1 1t <- glht(m1, linfct = K) summary(t)Simultaneous Tests for General Linear Hypotheses Fit: lm(formula = read ~ socst + factor(ses) * factor(female), data = hsb2) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) 1 == 0 -5.004 2.653 -1.886 0.0608 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Adjusted p values reported -- single-step method)

We can also test all possible pairwise combinations. To make this easier,
we will first create an "interaction" variable (using the function, **interaction**)
whose levels uniquely identify all the levels from **ses** and **female**.
All pairwise comparisons can then be calculated automatically.

# all pairwise comparsions # creating a BIG group variable hsb2$tall <- with(hsb2, interaction(female, ses, sep = "x")) m2 <- lm(read ~ socst + tall, data = hsb2) l2 <- glht(m2, linfct = mcp(tall = "Tukey")) summary(l2)Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: lm(formula = read ~ socst + tall, data = hsb2) Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) 1x1 - 0x1 == 0 -4.9856 2.5045 -1.991 0.344 0x2 - 0x1 == 0 -1.5349 2.3900 -0.642 0.987 1x2 - 0x1 == 0 -4.1496 2.4124 -1.720 0.513 0x3 - 0x1 == 0 -2.1245 2.6513 -0.801 0.966 1x3 - 0x1 == 0 0.2647 2.6297 0.101 1.000 0x2 - 1x1 == 0 3.4507 1.8207 1.895 0.401 1x2 - 1x1 == 0 0.8361 1.8251 0.458 0.997 0x3 - 1x1 == 0 2.8611 2.0911 1.368 0.740 1x3 - 1x1 == 0 5.2504 2.0752 2.530 0.117 1x2 - 0x2 == 0 -2.6146 1.6341 -1.600 0.592 0x3 - 0x2 == 0 -0.5896 1.9146 -0.308 1.000 1x3 - 0x2 == 0 1.7997 1.9008 0.947 0.932 0x3 - 1x2 == 0 2.0251 1.8835 1.075 0.888 1x3 - 1x2 == 0 4.4143 1.8750 2.354 0.173 1x3 - 0x3 == 0 2.3892 2.0850 1.146 0.858 (Adjusted p values reported -- single-step method)

- There are other ways in which the contrasts to be tested can be expressed in
**glht**. For the details of these other matrix-less methods, see this glht vignette. - This approach works for other types of model objects, including
**glm**and**lme**. However, for non-linear models, keep in mind that the tested coefficients are in the scale defined by the link function.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.