SPSS Library
MANOVA and GLM


This page was adapted from a web page at the SPSS web page.  We thank SPSS for their permission to adapt and distribute this page via our web site.


 

 


An overview of the GLM procedure

General linear modeling in SPSS for Windows

The general linear model (GLM) is a flexible statistical model that incorporates normally distributed dependent variables and categorical or continuous independent variables. The GLM procedure in SPSS allows you to specify general linear models through syntax or dialog boxes, and presents the results in pivot tables so you can easily edit the output. Among the many features available, GLM enables you to accommodate designs with empty cells, more readily interpret the results using profile plots of estimated means, and customize the linear model so that it directly addresses the research questions you ask. Anyone who regularly fits linear models, whether univariate, multivariate or repeated measures, will find the GLM procedure to be very useful. In this paper, we give you a more in-depth understanding of the various options within the GLM procedure and how you can use them. We also describe the key features of GLM. And, we discuss in detail the four types of sums of squares, estimated marginal means, profile plots and custom hypothesis tests.

Highlights of GLM

  • Covers a variety of linear models, such as univariate and multivariate regression, ANOVA and ANCOVA, mixed, MANOVA and MANCOVA, repeated measures and doubly multivariate repeated measures models.
  • For repeated measures models, GLM offers many commonly used contrasts for the within-subjects factors, including deviation, simple, difference, Helmert, repeated and polynomial contrasts. In addition, GLM provides both univariate and multivariate analyses for repeated measures.
  • Fits repeated measures models with constant covariates.
  • Uses the full-parameterization approach, with indicator variables created for every category of a factor, to construct the design matrix for a model. With this approach, GLM can handle empty cell problems encountered in a reparameterization approach.
  • Uses weighted least squares to estimate model parameters.
  • Offers four types of sums of squares for the effects in a model. (See the detailed section on sums of squares in the following pages.)
  • For mixed models, GLM automatically searches for the correct error term for each effect in the model and displays the expected mean squares for all effects.
  • Assesses the homogeneity of the variance and covariance structure of the dependent variables by Levene and Box M tests. In addition, it offers Bartlett's sphericity test of the residual covariance matrix in the case of a multivariate model, and Mauchly's sphericity test of the residual covariance matrix in the case of a repeated measures model.
  • Allows you to specify an error term for any between-subjects effect in a model. The error term may be another between-subjects effect in the model, a linear combination of other between-subjects effects in the model, or a specific value.
  • Lets you specify custom hypothesis tests via the LBM=K equation, with convenient subcommands to easily specify common custom tests. (See the detailed section on custom hypothesis tests in the following pages.)
  • Provides estimated marginal means of the dependent variables, with covariates held at their mean value, for specified factors. (See the detailed section on estimated marginal means in the following pages.)
  • Offers 18 post-hoc tests of observed means. Depending on the test, GLM performs pairwise comparisons among all levels of specified factors, or determines homogeneous subsets among the group means. The tests offered are SNK, Tukey's HSD, Tukey's b, Duncan, Scheffe, Dunnett, Bonferroni, LSD, Sidak, GT2 (Hochberg, 1974), Gabriel (1978), FREGW and QREGW (Ryan, 1959; Ryan, 1960; Einot & Gabriel, 1975), T2 (Tamhane, 1977), T3 (Dunnett, 1980), GH (Games & Howell, 1976), C (Dunnett, 1980) and Waller. Or, you can specify another between-subjects effect in the model to be used as the error term in the post-hoc tests.
  • Produces three types of plots: spread vs. level, residual and profile plots. For each dependent variable, the spread vs. level plot shows observed cell means vs. standard deviations, and vs. variances, across the level combinations of all factors. The residual plot produces an observed by predicted by standardized residual plot. The profile plot produces line plots of the estimated means of a dependent variable across levels of one, two or three factors. (See the detailed section on profile plots in the following pages.)
  • Creates a set of new variables and saves them in the working data file. The new variables include the predicted value, raw residual, standardized residual, studentized residual, deleted residual, standard error of the predicted value, Cook's distance, and uncentered leverage value. GLM also allows users to save the design matrix in the working data file.
  • Offers options for saving three kinds of external (Windows®) files containing model fit results, with which users can do follow-up analyses. Users can save parameter estimates, standard errors, significance levels, and either a parameter covariance or correlation matrix. In addition, users can save an effect file which contains the sum of squares, degrees of freedom, mean squares, F statistics, significance levels, noncentrality parameters and observed power levels for between-subjects effects in the model.

Four types of sums of squares in GLM

GLM gives you four convenient methods for computing sums of squares (SS). You can request any of the four types of SS in GLM easily. Type I SS method calculates the reduction in error SS by adding each effect to the model sequentially. Type I SS method is useful in balanced design models, polynomial regression models and nested models. It is useful when some effects (blocking effects) must be adjusted prior to analyzing other effects that interest the model. When comparing the Type I SS with other types of SS, it also reflects the effect of lack of balance in the data. To predict if some effect combinations would be useful in building a model, you can use Type II SS method. Type II SS method calculates the SS of an effect in the model adjusted for all other appropriate effects. An appropriate effect is an effect that does not contain the effect being examined. Suppose F1 and F2 are effects in a model. Then, we say that F2 contains F1 if:

  1. F1 and F2 involve the same continuous variables, if any, and
  2. F2 involves more between-subjects factors than F1, and any between-subjects factors involved in F1 also appear in F2.

The intercept effect, if any, is contained in all effects involving only between-subjects factors, but not in any effect involving continuous variables. Also, the intercept effect does not contain any other effects. Type II SS is useful in balanced design models, regression models and nested models.

Type III SS method is designed especially to deal with unbalanced models with no empty cells (i.e., all factor combinations are observed at least once). Type III SS method calculates the reduction in Error SS by adding the effect after all the other effects are adjusted. In a factorial design model with no missing cells, this method is equivalent to Yates' weighted-squares-of-means technique. Type III SS is useful in any balanced or unbalanced model with no empty cells. The hypotheses being tested by Type III SS involve only marginal averages of population cell means, which are easy to interpret.

When missing cells are present, Type I, II and III SS rarely have any reasonable interpretation. In these situations, GLM offers several tools for you to customize your hypothesis testing. And, GLM provides Type IV SS. This allows you to easily deal with the empty cells situation. Type IV SS method calculates the SS for an effect and generates a corresponding testable and interpretable hypothesis in which the cell mean coefficients are balanced. More specifically, a hypothesis matrix L for an effect F is constructed so each row of L, the columns corresponding to F are distributed equitably across the columns of effects containing F. This distribution is affected by the availability and pattern of the nonmissing cells.

Example

Suppose we own a musical CD-of-the-month club and want to add a new big band musical category. We have big band music preference ratings (BIGBAND), age category (AGECAT),and sex (SEX) variables for a listwise sample of 1,337 individuals. (The data used here are actually a subset of the 1993 U.S. General Social Survey data set.) We want to determine the age and sex categories to direct our big band marketing efforts.

figure 1

We can approach this problem using the GLM procedure, treating BIGBAND as our dependent variable, and AGECAT and SEX as factors. The design is unbalanced, and no empty cells exist, so we will obtain Type III SS. The syntax in Figure 1 gives us the results we need. Alternatively, you can specify the preceding GLM command using the dialog boxes.

figure 2

The Between-Subjects Factors information table in Figure 2 is an example of GLMs output. This table displays any value labels defined for levels of the between-subjects factors, and is a useful reference when interpreting GLM output. In this table, we see that SEX = 1 and 2 correspond to males and females, respectively. (Other selected output produced by the preceding syntax is described below.)

figure 3

The ANOVA table in Figure 3 demonstrates the AGECAT by SEX interaction effect is significant at p = .010. In our discussion of the four types of sums of squares available in GLM, we said Type II SS are useful in balanced designs. To give an idea of how the four types of sums of squares can differ using the same data and the same model, the following ANOVA table is based on the Type II SS method. Note that Type II SS were computed for comparative purposes only, and that the Type III SS displayed above should be used instead.

figure 4

Comparing the ANOVA tables based on Type II in Figure 4 (on the following page) vs. Type III in Figure 3 SS shows the sums of squares and other statistics differ for most effects. For the AGECAT and SEX effects, the Type II SS are larger than the Type III SS because the former are not adjusted for the AGECAT by SEX interaction effect, whereas the latter are. Note also that for the SEX effect, the significance level is lower, and the observed power is higher, for Type II than for Type III SS. It is possible that with other data or models the final results may differ more drastically (e.g., effects found to be significant using Type II SS may be insignificant using Type III SS), and invalid conclusions might be reached by using an inappropriate SS method.

Estimated marginal means

GLM will compute estimated marginal means of the dependent variables, with covariates held at their mean value, for specified between- or within-subjects factors in the model. These means are predicted means, not observed, and are based on the specified linear model. Standard errors are also provided.

GLM will also perform pairwise comparisons of the estimated marginal means of the dependent variables. These comparisons are performed among levels of a specified between- or within-subjects factor, and may be performed separately within each level combination of other specified between- or within-subjects factors. Where applicable, omnibus univariate or multivariate tests (associated with the pairwise comparisons) are also provided. In addition, the estimated marginal means can be plotted (see 'Profile plots below) for interpretation of additive and interactive effects among factors.

Example

figure 5

Continuing with the musical CD-of-the-month club example, we now examine the estimated marginal means output requested by the EMMEANS = TABLES (AGECAT * SEX) subcommand. The table in Figure 5 displays the estimated means for all AGECAT by SEX level combinations. This table reveals that for the younger age categories, males' preference ratings for big band music, as predicted by the model, were higher than those of females. However, for older age categories, females' ratings were higher than those of males.

Profile plots

GLM will produce line plots of the estimated means of a dependent variable across levels of one, two or three between- or within-subjects factors. Profile plots of two or three factors are typically referred to as interaction plots.

It is common to see interaction plots with observed means. However, when you plot observed means, the resulting picture shows both the effect being studied and the error. By plotting the estimated or predicted means, you get a picture of the effect without the error. (Recall that in the GLM, the dependent variable is equal to a linear combination of the parameters plus an error term. Plotting the observed means of the dependent variable across levels of a factor is the same as plotting the predicted values plus the errors.)

Example

figure 6

The profile plot of big band preference ratings by AGECAT by SEX is given in Figure 6. It was produced by the PLOT = PROFILE(AGECAT * SEX) syntax. Note that the plotted means are the same as those shown in the estimated marginal means table above. The interaction pattern is more apparent in this plot, which clearly displays a cross-over interaction between AGECAT and SEX. In general, we also see ratings generally increased with age, with a slight deviation from this pattern for males between the 30-39 and 40-49 age categories.

Custom hypothesis tests

GLM lets you perform custom hypothesis tests to define your own contrast. More particularly, you may compare specific level combinations of between-subjects effects and/or linear combinations of dependent variables. Custom hypothesis tests are denoted LBM=K, where L is a matrix of contrasts among the between-subjects effects, B is the matrix of parameter estimates, M is a matrix of contrasts among dependent variables and K is a matrix of hypothesized constants.

Custom hypothesis tests must be specified via syntax, but there are shortcut subcommands for some common tests. For example, in univariate models, the TEST subcommand allows you to specify the error term for any between-subjects effect in a model. This error term may be another between-subjects effect in the model, a linear combination of other between-subjects effects in the model, or a specific value. The results are displayed in an ANOVA table.

The CONTRAST subcommand creates an L matrix which corresponds to several commonly used contrasts, including deviation, simple, difference, Helmert, repeated and polynomial contrasts. Alternatively, you can take full advantage of the custom hypothesis testing functionality by specifying your own L, M or K matrices using the LMATRIX, MMATRIX or KMATRIX subcommands, respectively.

The MMATRIX subcommand provides much flexibility to define new dependent variables as linear combinations of the original dependent variables. This flexibility is useful in multivariate or repeated measures models when the conventional hypothesis tests do not directly address your research questions. With doubly multivariate repeated measures models, the MMATRIX subcommand even lets you define linear combinations of dependent variables across different measures.

For tests specified using the CONTRAST, LMATRIX, MMATRIX or KMATRIX subcommands, the results include the contrast estimate, the difference between the estimate and the hypothesized value of the contrast, the standard error, and a confidence interval for the difference. Also provided are omnibus univariate and multivariate tests.

Example

The profile plot shown in the previous section suggests several interesting comparisons. Namely, at each age category, are the ratings of males significantly different from those of females? The null hypothesis of interest can be expressed as:

H0: (Mean Rating for Males Aged 18-29) - (Mean Rating for Females Aged 18-29) = 0 and
(Mean Rating for Males Aged 30-39) - (Mean Rating for Females Aged 30-39) = 0 and
(Mean Rating for Males Aged 40-49) - (Mean Rating for Females Aged 40-49) = 0 and
(Mean Rating for Males Aged 50+) - (Mean Rating for Females Aged 50+) = 0

figure 7

We can test this hypothesis using GLMs custom hypothesis testing tools by defining the L matrix. (The full custom hypothesis test equation is LBM=K, but the M matrix in this example contains the single value 1 because the model is univariate, and the K matrix is a 4-vector of zeros. Both of these matrices are GLM defaults in this example.) The L matrix is defined using the LMATRIX subcommand — please refer to the SPSS Advanced Statistics 7.0 Update manual for details — and is printed using the PRINT = TEST(LMATRIX) subcommand. The syntax is shown in Figure 7.

figure 8

The transpose of the L matrix corresponding to our comparisons is shown in Figure 8.

As shown in the ANOVA table in Figure 9 below, the overall contrast was significant at p = .017. We reject the null hypothesis of equal ratings for males and females at each age category.

figure 9

The contrast results table in Figure 10 below allows us to further examine the individual contrasts in our hypothesis. Contrasts L1, L2, L3 and L4 compare males and females at each of the respective age categories.

figure 10

The 95 percent confidence intervals reveal that the male vs. female difference is significant only for the 40-49 age category. For individuals aged 40-49, the mean preference rating for big band music was significantly higher for females than for males.

Note that the custom hypothesis test discussed in this example is one of many which could be done. Depending on the nature of the research questions you ask, you would construct a different L matrix in each case.

On the basis of these results, however, we might consider directing our big band marketing efforts at individuals in the 40-49 and 50+ age categories, with the possible exception of males in the 40-49 age category. 


This page was adapted from a web page at the SPSS web page.  We thank SPSS for their permission to adapt and distribute this page via our web site.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.