WHAT KIND OF CONTRASTS ARE THESE?

David P. Nichols
Senior Support Statistician
SPSS, Inc.
From SPSS Keywords, Number 63, 1997

Interpretation of parameter estimates is an essential part of the predictive
modeling process. Estimates of interest often represent contrasts among the
levels of a categorical predictor variable. A contrast is defined by a set of
coefficients that sum to 0 over the levels of the categorical variable of
interest.

In SPSS, issues of interpretation of contrast results arise in several
procedures, including LOGISTIC REGRESSION and COX REGRESSION. Both procedures
have facilities for automatically treating predictors (or covariates) as
as categorical variables. When a covariate with K levels is declared to be
categorical in either one of these procedures, a set of K-1 variables is
produced internally, and these variables are used as a set in the analysis.

The values of the K-1 variables are determined by the choice of contrasts
made by the user. The default contrasts in the current 7.5 release of SPSS
for Windows have been changed in both procedures to INDICATOR, with the
last category as the reference group. These contrasts produce estimates
comparing each other group to the reference group.

A point of considerable confusion among SPSS users is the relationship between
the values of the internally created variables and the interpretation of the
resulting parameter estimates. The output for the LOGISTIC REGRESSION and COX
REGRESSION procedures provides the values of the internal variables used to
estimate the desired contrasts. For example, suppose we have a three level
categorical covariate. The new default INDICATOR contrasts would produce a
set of "parameter codings" like those in Figure 1.

Figure 1: Parameter codings for INDICATOR contrasts
-------------------------------------------------------------------------------
Parameter
Value   Freq  Coding
(1)    (2)
GROUP
1    106  1.000   .000
2    116   .000  1.000
3    107   .000   .000
-------------------------------------------------------------------------------
End Figure 1

The predictor here is called simply GROUP. It takes on the values 1-3, with
frequencies listed in the "Freq" column. The columns on the right (what are
being called parameter codings) give the values of the internal variables
created to represent the original categorical covariate. In this case there
are two internal variables created. For the first variable, cases with a
value of 1 for GROUP get a 1, while all other cases get a 0. For the second,
cases with a 2 for GROUP get a 1, with all other cases getting a 0.

The question that this output often elicits from SPSS users is how does this
coding produce the contrasts claimed in our documentation? The reason is that
one must distinguish between the values of the contrast coefficients defining
contrasts of interest and the values of the variables in the data that will
produce such a set of contrasts. The columns in the data that produce certain
contrasts will resemble the contrast coefficients only when the matrix of
contrast coefficients is orthogonal (the inner product of any two row vectors
in the contrast matrix is 0). INDICATOR contrasts are not orthogonal, nor are
the other most commonly used types in logistic or Cox regression models. Thus
it is important to understand the following relationship between the columns
of the data and the contrast results.

If we append a constant unit (1) column onto the beginning of the two columns
given above, we get what we call a basis or design matrix for generating the
desired contrasts. If we call this matrix X, then for any model that uses a
linear combination of the predictors in generating it's prediction function,
we can compute C, the matrix of contrast coefficients, as:

-1
C = (X'X)  X'

For the example given here, the basis matrix for INDICATOR contrasts given
in Figure 2 produces the contrast matrix given in Figure 2.

Figure 2: Basis and contrast matrices for INDICATOR contrasts
-------------------------------------------------------------------------------
Basis:   1  1  0           Contrast:   0  0  1
1  0  1                       1  0 -1
1  0  0                       0  1 -1
-------------------------------------------------------------------------------

The first row of the contrast matrix gives the coefficients for the constant
or intercept term, which with INDICATOR contrasts estimates the predicted
value for the reference group (here, the last one). The other two rows give
the contrasts estimated by the GROUP(1) and GROUP(2) parameter estimates,
which are, respectively, the first group minus the last and the second minus
the last.

Earlier releases of SPSS used DEVIATION as the default contrast type, with the
last category as the reference or excluded out category. DEVIATION contrasts
compare each group other than the excluded group to the unweighted average of
all groups. The value for the left out group is then by definition the
negative of the sum of the given parameter estimates. Considerable confusion
has resulted from the fact that the basis or design matrix for DEVIATION
contrasts resembles the contrast matrix for SIMPLE contrasts, which compare
each group to a reference category (like INDICATOR contrasts). It turns out
that DEVIATION and SIMPLE contrasts are in a sense mirror images of one
another, in that the variable codings required to produce one type of
contrasts look like the transpose of the contrast matrix for the other type
of contrasts.

These relationships are illustrated for the three level case in Figures 3 and
4 (using fractions for precision; SPSS output shows decimal values). Note that
the contrasts estimated for GROUP(1) and GROUP(2) are the same for SIMPLE
contrasts as for INDICATOR, but that the intercept is now an unweighted
average of all levels rather than the value for the last (or more generally,
the reference) group.

Figure 3: Basis and contrast matrices for DEVIATION contrasts
-------------------------------------------------------------------------------
Basis:   1  1  0           Contrast:   1/3  1/3  1/3
1  0  1                       2/3 -1/3 -1/3
1 -1 -1                      -1/3  2/3 -1/3
-------------------------------------------------------------------------------

Figure 4: Basis and contrast matrices for SIMPLE contrasts
-------------------------------------------------------------------------------
Basis:   1  2/3 -1/3       Contrast:   1/3  1/3  1/3
1 -1/3  2/3                    1    0   -1
1 -1/3 -1/3                    0    1   -1
-------------------------------------------------------------------------------


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.