An overview of the GLM procedure
General linear modeling in SPSS for Windows
The general linear model (GLM) is a flexible statistical model that incorporates
normally distributed dependent variables and categorical or continuous independent
variables. The GLM procedure in SPSS allows you to specify general linear models through
syntax or dialog boxes, and presents the results in pivot tables so you can easily edit
the output. Among the many features available, GLM enables you to accommodate designs with
empty cells, more readily interpret the results using profile plots of estimated means,
and customize the linear model so that it directly addresses the research questions you
ask. Anyone who regularly fits linear models, whether univariate, multivariate or repeated
measures, will find the GLM procedure to be very useful. In this paper, we give you a more
in-depth understanding of the various options within the GLM procedure and how you can use
them. We also describe the key features of GLM. And, we discuss in detail the four types
of sums of squares, estimated marginal means, profile plots and custom hypothesis tests.
Highlights of GLM
- Covers a variety of linear models, such as univariate and multivariate regression, ANOVA
and ANCOVA, mixed, MANOVA and MANCOVA, repeated measures and doubly multivariate repeated
- For repeated measures models, GLM offers many commonly used contrasts for the
within-subjects factors, including deviation, simple, difference, Helmert, repeated and
polynomial contrasts. In addition, GLM provides both univariate and multivariate analyses
for repeated measures.
- Fits repeated measures models with constant covariates.
- Uses the full-parameterization approach, with indicator variables created for every
category of a factor, to construct the design matrix for a model. With this approach, GLM
can handle empty cell problems encountered in a reparameterization approach.
- Uses weighted least squares to estimate model parameters.
- Offers four types of sums of squares for the effects in a model. (See the detailed
section on sums of squares in the following pages.)
- For mixed models, GLM automatically searches for the correct error term for each effect
in the model and displays the expected mean squares for all effects.
- Assesses the homogeneity of the variance and covariance structure of the dependent
variables by Levene and Box M tests. In addition, it offers Bartlett's sphericity test of
the residual covariance matrix in the case of a multivariate model, and Mauchly's
sphericity test of the residual covariance matrix in the case of a repeated measures
- Allows you to specify an error term for any between-subjects effect in a model. The
error term may be another between-subjects effect in the model, a linear combination of
other between-subjects effects in the model, or a specific value.
- Lets you specify custom hypothesis tests via the LBM=K equation, with convenient
subcommands to easily specify common custom tests. (See the detailed section on custom
hypothesis tests in the following pages.)
- Provides estimated marginal means of the dependent variables, with covariates held at
their mean value, for specified factors. (See the detailed section on estimated marginal
means in the following pages.)
- Offers 18 post-hoc tests of observed means. Depending on the test, GLM performs pairwise
comparisons among all levels of specified factors, or determines homogeneous subsets among
the group means. The tests offered are SNK, Tukey's HSD, Tukey's b, Duncan, Scheffe,
Dunnett, Bonferroni, LSD, Sidak, GT2 (Hochberg, 1974), Gabriel (1978), FREGW and QREGW
(Ryan, 1959; Ryan, 1960; Einot & Gabriel, 1975), T2 (Tamhane, 1977), T3 (Dunnett,
1980), GH (Games & Howell, 1976), C (Dunnett, 1980) and Waller. Or, you can specify
another between-subjects effect in the model to be used as the error term in the post-hoc
- Produces three types of plots: spread vs. level, residual and profile plots. For each
dependent variable, the spread vs. level plot shows observed cell means vs. standard
deviations, and vs. variances, across the level combinations of all factors. The residual
plot produces an observed by predicted by standardized residual plot. The profile plot
produces line plots of the estimated means of a dependent variable across levels of one,
two or three factors. (See the detailed section on profile plots in the following pages.)
- Creates a set of new variables and saves them in the working data file. The new
variables include the predicted value, raw residual, standardized residual, studentized
residual, deleted residual, standard error of the predicted value, Cook's distance, and
uncentered leverage value. GLM also allows users to save the design matrix in the working
- Offers options for saving three kinds of external (Windows®) files containing model fit
results, with which users can do follow-up analyses. Users can save parameter estimates,
standard errors, significance levels, and either a parameter covariance or correlation
matrix. In addition, users can save an effect file which contains the sum of squares,
degrees of freedom, mean squares, F statistics, significance levels, noncentrality
parameters and observed power levels for between-subjects effects in the model.
Four types of sums of squares in GLM
GLM gives you four convenient methods for computing sums of squares (SS). You can
request any of the four types of SS in GLM easily. Type I SS method calculates the
reduction in error SS by adding each effect to the model sequentially. Type I SS method is
useful in balanced design models, polynomial regression models and nested models. It is
useful when some effects (blocking effects) must be adjusted prior to analyzing other
effects that interest the model. When comparing the Type I SS with other types of SS, it
also reflects the effect of lack of balance in the data. To predict if some effect
combinations would be useful in building a model, you can use Type II SS method. Type II
SS method calculates the SS of an effect in the model adjusted for all other appropriate
effects. An appropriate effect is an effect that does not contain the effect being
examined. Suppose F1 and F2 are effects in a model. Then, we say that F2 contains F1 if:
- F1 and F2 involve the same continuous variables, if any, and
- F2 involves more between-subjects factors than F1, and any between-subjects factors
involved in F1 also appear in F2.
The intercept effect, if any, is contained in all effects involving only
between-subjects factors, but not in any effect involving continuous variables. Also, the
intercept effect does not contain any other effects. Type II SS is useful in balanced
design models, regression models and nested models.
Type III SS method is designed especially to deal with unbalanced models with no empty
cells (i.e., all factor combinations are observed at least once). Type III SS method
calculates the reduction in Error SS by adding the effect after all the other effects are
adjusted. In a factorial design model with no missing cells, this method is equivalent to
Yates' weighted-squares-of-means technique. Type III SS is useful in any balanced or
unbalanced model with no empty cells. The hypotheses being tested by Type III SS involve
only marginal averages of population cell means, which are easy to interpret.
When missing cells are present, Type I, II and III SS rarely have any reasonable
interpretation. In these situations, GLM offers several tools for you to customize your
hypothesis testing. And, GLM provides Type IV SS. This allows you to easily deal with the
empty cells situation. Type IV SS method calculates the SS for an effect and generates a
corresponding testable and interpretable hypothesis in which the cell mean coefficients
are balanced. More specifically, a hypothesis matrix L for an effect F is constructed so
each row of L, the columns corresponding to F are distributed equitably across the columns
of effects containing F. This distribution is affected by the availability and pattern of
the nonmissing cells.
Suppose we own a musical CD-of-the-month club and want to add a new big band musical
category. We have big band music preference ratings (BIGBAND), age category (AGECAT),and
sex (SEX) variables for a listwise sample of 1,337 individuals. (The data used here are
actually a subset of the 1993 U.S. General Social Survey data set.) We want to determine
the age and sex categories to direct our big band marketing efforts.
We can approach this problem using the GLM procedure, treating BIGBAND as our dependent
variable, and AGECAT and SEX as factors. The design is unbalanced, and no empty cells
exist, so we will obtain Type III SS. The syntax in Figure 1 gives us the results we need.
Alternatively, you can specify the preceding GLM command using the dialog boxes.
The Between-Subjects Factors information table in Figure 2 is an example of GLMs
output. This table displays any value labels defined for levels of the between-subjects
factors, and is a useful reference when interpreting GLM output. In this table, we see
that SEX = 1 and 2 correspond to males and females, respectively. (Other selected output
produced by the preceding syntax is described below.)
The ANOVA table in Figure 3 demonstrates the AGECAT by SEX interaction effect is
significant at p = .010. In our discussion of the four types of sums of squares available
in GLM, we said Type II SS are useful in balanced designs. To give an idea of how the four
types of sums of squares can differ using the same data and the same model, the following
ANOVA table is based on the Type II SS method. Note that Type II SS were computed for
comparative purposes only, and that the Type III SS displayed above should be used
Comparing the ANOVA tables based on Type II in Figure 4 (on the following page) vs.
Type III in Figure 3 SS shows the sums of squares and other statistics differ for most
effects. For the AGECAT and SEX effects, the Type II SS are larger than the Type III SS
because the former are not adjusted for the AGECAT by SEX interaction effect, whereas the
latter are. Note also that for the SEX effect, the significance level is lower, and the
observed power is higher, for Type II than for Type III SS. It is possible that with other
data or models the final results may differ more drastically (e.g., effects found to be
significant using Type II SS may be insignificant using Type III SS), and invalid
conclusions might be reached by using an inappropriate SS method.
Estimated marginal means
GLM will compute estimated marginal means of the dependent variables, with covariates
held at their mean value, for specified between- or within-subjects factors in the model.
These means are predicted means, not observed, and are based on the specified linear
model. Standard errors are also provided.
GLM will also perform pairwise comparisons of the estimated marginal means of the
dependent variables. These comparisons are performed among levels of a specified between-
or within-subjects factor, and may be performed separately within each level combination
of other specified between- or within-subjects factors. Where applicable, omnibus
univariate or multivariate tests (associated with the pairwise comparisons) are also
provided. In addition, the estimated marginal means can be plotted (see 'Profile
below) for interpretation of additive and interactive effects among factors.
Continuing with the musical CD-of-the-month club example, we now examine the estimated
marginal means output requested by the EMMEANS = TABLES (AGECAT * SEX) subcommand. The
table in Figure 5 displays the estimated means for all AGECAT by SEX level combinations.
This table reveals that for the younger age categories, males' preference ratings for big
band music, as predicted by the model, were higher than those of females. However, for
older age categories, females' ratings were higher than those of males.
GLM will produce line plots of the estimated means of a dependent variable across
levels of one, two or three between- or within-subjects factors. Profile plots of two or
three factors are typically referred to as interaction plots.
It is common to see interaction plots with observed means. However, when you plot
observed means, the resulting picture shows both the effect being studied and the error.
By plotting the estimated or predicted means, you get a picture of the effect without the
error. (Recall that in the GLM, the dependent variable is equal to a linear combination of
the parameters plus an error term. Plotting the observed means of the dependent variable
across levels of a factor is the same as plotting the predicted values plus the errors.)
The profile plot of big band preference ratings by AGECAT by SEX is given in Figure 6.
It was produced by the PLOT = PROFILE(AGECAT * SEX) syntax. Note that the plotted means
are the same as those shown in the estimated marginal means table above. The interaction
pattern is more apparent in this plot, which clearly displays a cross-over interaction
between AGECAT and SEX. In general, we also see ratings generally increased with age, with
a slight deviation from this pattern for males between the 30-39 and 40-49 age categories.
Custom hypothesis tests
GLM lets you perform custom hypothesis tests to define your own contrast. More
particularly, you may compare specific level combinations of between-subjects effects
and/or linear combinations of dependent variables. Custom hypothesis tests are denoted
LBM=K, where L is a matrix of contrasts among the between-subjects effects, B is the
matrix of parameter estimates, M is a matrix of contrasts among dependent variables and K
is a matrix of hypothesized constants.
Custom hypothesis tests must be specified via syntax, but there are shortcut
subcommands for some common tests. For example, in univariate models, the TEST subcommand
allows you to specify the error term for any between-subjects effect in a model. This
error term may be another between-subjects effect in the model, a linear combination of
other between-subjects effects in the model, or a specific value. The results are
displayed in an ANOVA table.
The CONTRAST subcommand creates an L matrix which corresponds to several commonly used
contrasts, including deviation, simple, difference, Helmert, repeated and polynomial
contrasts. Alternatively, you can take full advantage of the custom hypothesis testing
functionality by specifying your own L, M or K matrices using the LMATRIX, MMATRIX or
KMATRIX subcommands, respectively.
The MMATRIX subcommand provides much flexibility to define new dependent variables as
linear combinations of the original dependent variables. This flexibility is useful in
multivariate or repeated measures models when the conventional hypothesis tests do not
directly address your research questions. With doubly multivariate repeated measures
models, the MMATRIX subcommand even lets you define linear combinations of dependent
variables across different measures.
For tests specified using the CONTRAST, LMATRIX, MMATRIX or KMATRIX subcommands, the
results include the contrast estimate, the difference between the estimate and the
hypothesized value of the contrast, the standard error, and a confidence interval for the
difference. Also provided are omnibus univariate and multivariate tests.
The profile plot shown in the previous section suggests several interesting
comparisons. Namely, at each age category, are the ratings of males significantly
different from those of females? The null hypothesis of interest can be expressed as:
H0: (Mean Rating for Males Aged 18-29) - (Mean Rating for Females Aged
18-29) = 0 and
(Mean Rating for Males Aged 30-39) - (Mean Rating for Females Aged 30-39) = 0 and
(Mean Rating for Males Aged 40-49) - (Mean Rating for Females Aged 40-49) = 0 and
(Mean Rating for Males Aged 50+) - (Mean Rating for Females Aged 50+) = 0
We can test this hypothesis using GLMs custom hypothesis testing tools by defining the
L matrix. (The full custom hypothesis test equation is LBM=K, but the M matrix in this
example contains the single value 1 because the model is univariate, and the K matrix is a
4-vector of zeros. Both of these matrices are GLM defaults in this example.) The L matrix
is defined using the LMATRIX subcommand — please refer to the SPSS Advanced Statistics
7.0 Update manual for details — and is printed using the PRINT = TEST(LMATRIX)
subcommand. The syntax is shown in Figure 7.
The transpose of the L matrix corresponding to our comparisons is shown in Figure 8.
As shown in the ANOVA table in Figure 9 below, the overall contrast was significant at
p = .017. We reject the null hypothesis of equal ratings for males and females at each age
The contrast results table in Figure 10 below allows us to further examine the
individual contrasts in our hypothesis. Contrasts L1, L2, L3 and L4 compare males and
females at each of the respective age categories.
The 95 percent confidence intervals reveal that the male vs. female difference is
significant only for the 40-49 age category. For individuals aged 40-49, the mean
preference rating for big band music was significantly higher for females than for males.
Note that the custom hypothesis test discussed in this example is one of many which
could be done. Depending on the nature of the research questions you ask, you would
construct a different L matrix in each case.
On the basis of these results, however, we might consider directing our big band
marketing efforts at individuals in the 40-49 and 50+ age categories, with the possible
exception of males in the 40-49 age category.