### SPSS Library MANOVA and GLM

This page contains 3 articles that originally appeared in SPSS Keywords.  Although the versions of SPSS referenced in this article are old, much of the information in article is still very useful and timely.

FROM MANOVA TO GLM: BASICS OF PARAMETERIZATION

David P. Nichols
Senior Support Statistician
SPSS, Inc.
From SPSS Keywords, Number 64, 1997

In Release 7.0, SPSS introduced a new GLM (General Linear Models) procedure. In Release 7.5, dialog box support for the MANOVA procedure was removed (it remains available via command syntax). This article is the first of a planned series introducing GLM and aimed at easing the transition for users from MANOVA to GLM. Though perhaps of most immediate relevance to users of 7.0 and above releases, the statistical topics discussed are of relevance to users of any version of SPSS.

As regular readers of the Statistically Speaking section of SPSS Keywords are no doubt aware, I believe that understanding how a model is parameterized is crucial in interpreting the results of any statistical modeling procedure. Understanding the differences between GLM and MANOVA's approaches to parameterizing models is important both for understanding how to use the procedures and for what it can show us about aspects of modeling that apply to any approach.

The topics we will undertake are quite complicated, so we're going to start at the beginning and move to more complex situations in time. The very simplest linear model we might fit to a set of data is one in which each value is predicted to be the same; that is, the model contains only a constant term. The basis or design matrix
for a constant only model contains a single column, with the value 1 for every case. The least squares solution for such a model predicts that each case is equal to the mean of all cases. Obviously, this is not a model that we are likely to seriously entertain in the great majority of circumstances. However, it is quite useful as a baseline model, against which to compare models one step up in complexity.

Moving up one level, we have models in which the dependent variable is presumed to be a function of a single predictor variable. If the predictor is quantitative (sometimes loosely called "continuous"), parameterization is quite simple: we simply add a column containing the measured values on the quantitative predictor to the constant column. However, if the predictor is categorical, things become more difficult, and alternative approaches to parameterization are possible.

For example, if we have a single three level factor (A), there are a number of different ways to represent the model. A representation that could be referred to as canonical or basic, would be to add a parameter for each level of factor A, in addition to that for the constant term. The design matrix, after deleting redundant rows (so that we have only one row per cell or unique predicted value), would look as shown in Figure 1. We refer to this design matrix as an overparameterized indicator design matrix.

Figure 1: Overparameterized Indicator Design Matrix

Level of A    C    A1  A2  A3
1    1    1   0   0
2    1    0   1   0
3    1    0   0   1

Here we have a 3x4 matrix, with one row for each possible value of A. There is one parameter for each level of A, in addition to that for the constant.  The problem we confront here is that the final column, that for A3, can be expressed as a linear combination of the preceding columns. Specifically,

A3 = C - A1 - A2 .

Because A3 can be expressed as a linear combination of preceding columns, it is mathematically redundant. That is, estimation of the fourth parameter in the model cannot add any information to that provided by estimation of the previous three parameters.

The GLM procedure handles computation of overparameterized models via a "sweep" operator that produces a generalized inverse of X'X (where X is the original basis or design matrix). The practical effect of this computational method is to alias redundant model parameters to 0. That is, no parameter estimate is produced for a redundant column. The non-aliased estimates produced for a one factor model are the same as those produced in a linear regression model by using indicator or dummy coding, with the last category of the factor representing the reference category. Unlike MANOVA, GLM does not use specifications of contrast types from the user to determine what parameters are to be estimated (contrast results are available in GLM, but they are produced after the initial parameter estimation phase; in MANOVA, the design matrix is actually built from the contrasts specified by the user or by defaults). The use of a single set of basic or canonical parameters allows GLM to approach problem situations in a more systematic manner. It also allows for simpler application of the parameter estimates. Regardless of the model, when reproducing predicted values each parameter estimate for a factor is either used or not used (multiplied by 1 or 0, respectively). Recall from earlier articles that the basis or design matrices used in MANOVA often include a variety of other values, depending on the type of contrasts specified and the number of levels of the factor(s) involved.  In the next installment, we'll begin to illustrate the differences between the two approaches, and highlight the essential commonalities that are at the heart of what we wish to discover when we use quantitative models.

FROM MANOVA TO GLM: MORE BASICS OF PARAMETERIZATION

David P. Nichols
Senior Support Statistician
SPSS, Inc.
From SPSS Keywords, Number 65, 1997

In the last issue, I began discussing the differences between the ways in which the MANOVA and GLM procedures parameterize linear models with categorical or factor variables. I noted that GLM always uses one particular kind of parameterization, referred to as an overparameterized indicator design matrix, while MANOVA produces a design matrix based specifically on what kind of contrasts the user specifies, or else on defaults. The parameterization used in GLM was referred to as canonical or basic because it represents the model in a somewhat more fundamental way than do the MANOVA parameterizations.

An important feature of such a model illustrated by the GLM design matrix is the fact that there is no unique solution to the set of equations defined by this matrix. Another way to say this is that the model is overparameterized. As noted last time, GLM deals with this by using a generalized inverse produced via a symmetric sweep operator applied to the X'X matrix. The solution produced for a particular problem by MANOVA will depend on the contrast specifications. Figure 1 presents the design matrix used by GLM and the default design matrix used by MANOVA for the case of a single three level factor A.

Figure 1: Design Matrices for GLM and MANOVA defaults

             GLM Design Matrix    MANOVA Design Matrix
Level of A   C    A1   A2   A3    C    A1   A2
1        1    1    0    0     1    1    0
2        1    0    1    0     1    0    1
3        1    0    0    1     1   -1   -1

Perhaps the most obvious difference between the two matrices is the number of columns in each. The GLM matrix reflects the overparameterized nature of the model by identifying four parameters. The MANOVA matrix has been reduced to the number of nonredundant parameters available for estimation (three: one for each group mean), via what is known as _reparameterization_. That is, the MANOVA design matrix can be constructed from the more general GLM design matrix by factoring the GLM matrix into the product of two matrices: the MANOVA design or basis matrix, and a contrast matrix whose rows show the interpretations of the parameter estimates produced by use of the MANOVA design matrix.

Another notable difference is the fact that the GLM design matrix contains only ones and zeros, so that when computing the predicted value for a given case via the linear model equation

^ ^
Y = XB

you would need only to identify which parameter estimates are used (have a 1 for that row), and sum these. The design matrices from MANOVA will have numbers other than 0 and 1, including negative numbers, and sometimes very complicated decimal values. This makes reproducing predictions much easier in GLM than in MANOVA.

As noted above, with three groups, there are three means to estimate, and the basic model, represented by the GLM design matrix, is overparameterized. GLM handles such situations by aliasing redundant or linearly dependent parameters to 0. In this example, the parameter A3 is fixed at 0, since the A3 column is equal to C-A1-A2. This method is sometimes referred to as using set to 0 restrictions. MANOVA handles this by reparameterizing the model, using a method that is sometimes referred to as using sum to 0 restrictions (note that each A column in the design matrix sums to 0). MANOVA offers a variety of ways to define the design or basis matrix, each one producing a different set of parameter estimates and contrast coefficients. All of these methods produce K-1 contrasts for a K level factor, and all imply design matrix columns that sum to 0 for each of these K-1 columns.

Since the A3 parameter in GLM is fixed to 0, and we know that the predicted value for a case in group 3 is that group's mean for a one factor model, we know that the C parameter in GLM must be estimating the mean of the third group. Since the predicted values for cases in the other groups are given by C+A1 and C+A2, A1 and A2 must be estimating the differences between the other two group means and the final one. As noted last issue, these are the same interpretations resulting from a model using just two dummy variables
representing each of the first two groups.

With MANOVA's sum to 0 approach, the estimate for the constant column always produces the unweighted average of the cell means in a single factor design. The particular contrast coding used here (the default), produces estimates for A1 and A2 that are the deviations of each of the first two groups from this unweighted grand mean. The deviation of the final group from the mean is the negative of the sum of the other deviations. While all of the contrast options in MANOVA produce the same type of constant or intercept term, the
estimates for the A terms will differ depending on what type of contrasts are specified.

The important point to note here is that while the parameters estimated by the two procedures (or the different choices in MANOVA) are different, the underlying model, the predicted values for a given case, and the summary measures of prediction error (such as the sum of squared residuals) are always the same. With regard to parameters, while there is no unique value resulting for the constant or intercept, nor for any of the A terms, any contrast (linear combination whose coefficients sum to 0) among the A terms will have an unique value, regardless of how the model is parameterized. The invariance of contrasts among the A parameters is of such importance that it occupies a central place in theoretical treatments of statistical estimation. That central concept will be the focus of the next installment.

MY TESTS DON'T AGREE!

David P. Nichols
Principal Support Statistician and
Manager of Statistical Support
SPSS, Inc.
From SPSS Keywords, Number 66, 1998

A common question asked of SPSS Statistical Support is how to interpret a set of tests that are testing the same or logically related null hypotheses, yet produce different conclusions. The prime example of this would be a situation where an omnibus F-test in an analysis of variance (ANOVA) produces a significance level (p-value) less than a critical alpha (such as .05), but follow up tests comparing levels of the factor do not produce any p-values less than alpha, or conversely, where the omnibus F-test is not significant at the given alpha level, while one or more pairwise comparisons are significant.

For an example, consider a one way ANOVA model of a very simple form: three groups, equal sample sizes, with standard independence, normality, and homogeneity of variance assumptions met. Further, assume that our numbers have been measured without error. The null hypothesis tested by the omnibus F-test is that all three population means are equal:

mu(1) = mu(2) = mu(3).

The null hypothesis tested by a pairwise comparison of groups i and j is that these two population means are equal:

mu(i) = mu(j).

The hypotheses tested by the omnibus test and pairwise comparisons are thus logically related: the omnibus null hypothesis is the union of all pairwise null hypotheses. That is, all population means can only be equal if any two chosen from the set are equal. However, as many readers may know, the results of significance tests using sample data frequently produce logical contradictions. How can this be?

The reason that such contradictory results can occur is that when we are making inferences about population parameters (such as population means) using sample data, our estimates are subject to sampling error. Were we dealing with the entireties of finite populations, we could simply compute the mean or other parameter(s) of interest in each population, and compare the results. There would be no sampling error, and hence no need for measures of precision of estimation, such as standard errors. Our decisions with regard to the above stated null hypotheses would then be logically consistent: the numbers would all be equal, or else some would differ from others.

Since we generally do not have the luxury of addressing problems where we can identify entire finite populations and measure all values, we are forced to work with samples and to make inferences about the unknown population values. The mean or other parameter values that we compute are estimates of the true unknown values, and these estimates are subject to sampling error. Thus, the means computed from several random samples from populations with the same mean will not generally be equal. We are not able to specify what the value of a sample mean will be even if we know the population value. What we can specify is the _distribution_ of sample means and various related statistics under such circumstances.

Thus, the logic behind the standard F-test in an ANOVA is that if all of the assumptions are met, the distribution of F-values in repeated samples will follow the theoretical central F distribution with appropriate degrees of  freedom if the null hypothesis is true. The logic behind the pairwise comparison tests is identical: if the model assumptions are met and the two population means of interest are equal, the t or F statistics produced by repeated sampling will follow the appropriate theoretical central t or F distributions. The important point is that the methodology of statistical inference does not allow us to state what will happen in a particular case, only the distributions of results in repeated random samples. It thus does not preclude the possibility of logically contradictory results. This state of affairs, while disconcerting to many, is simply part of the price we pay when we seek to make inferences based on samples.

In the case of a significant omnibus F-statistic and nonsignificant pairwise comparisons, some people have proposed the explanation that while no two means are different, some more complicated contrast among the means is nonzero, leading to the significant omnibus F. Such an explanation mistakes the mechanics of the methodology of the F-statistic for the hypothesis being tested. That is, while the F-statistic can be constructed as a function of the maximal single degree of freedom contrast computable from the sample data, the hypothesis tested is still that the population means are all equal, and the contrast value can only be nonzero in the population if at least one population mean is different from the others.

To broaden the discussion a bit and reinforce the point, consider a simple two way crosstabulation or contingency table. The two most popular test statistics for testing the null hypothesis of no population association between rows and columns are the Pearson and the Likelihood Ratio (LR) chi-squared tests. These statistics are testing the same null hypothesis and follow the same theoretical distribution under that null hypothesis, but they will sometimes yield different conclusions for a set of sample data. Again, the reason is that sampling variability means that we can only know about the distributions of the test statistics, not what they will be in particular cases.

What can we do about this? We cannot generally avoid the problem, as we are not usually in a position to identify finite populations of interest and measure all members of these populations. The best we can do is to
understand the true nature of the problem and accept its implications. One is that the problem will always be with us in standard situations. The other is that we can minimize it by using larger samples, which provide us with greater levels of precision, and reduce the probability of seeing such results. As sample sizes increase to infinity, sampling errors converge to 0. Though we cannot achieve infinite sample sizes, the larger our samples, all other things being equal, the firmer our results.