|
|
|
||||
|
|
|||||
This page was adapted from a web page at the SPSS web page. We thank SPSS for their permission to adapt and distribute this page via our web site.
This page contains 3 articles that originally appeared in SPSS Keywords. Although the versions of SPSS referenced in this article are old, much of the information in article is still very useful and timely.
FROM MANOVA TO GLM: BASICS OF PARAMETERIZATION
David P. Nichols
Senior Support Statistician
SPSS, Inc.
From SPSS Keywords, Number 64, 1997
In Release 7.0, SPSS introduced a new GLM (General Linear Models) procedure. In Release
7.5, dialog box support for the MANOVA procedure was removed (it remains available via
command syntax). This article is the first of a planned series introducing GLM and aimed
at easing the transition for users from MANOVA to GLM. Though perhaps of most immediate
relevance to users of 7.0 and above releases, the statistical topics discussed are of
relevance to users of any version of SPSS.
As regular readers of the Statistically Speaking section of SPSS Keywords are no doubt
aware, I believe that understanding how a model is parameterized is crucial in
interpreting the results of any statistical modeling procedure. Understanding the
differences between GLM and MANOVA's approaches to parameterizing models is important both
for understanding how to use the procedures and for what it can show us about aspects of
modeling that apply to any approach.
The topics we will undertake are quite complicated, so we're going to start at the
beginning and move to more complex situations in time. The very simplest linear model we
might fit to a set of data is one in which each value is predicted to be the same; that
is, the model contains only a constant term. The basis or design matrix
for a constant only model contains a single column, with the value 1 for every case. The
least squares solution for such a model predicts that each case is equal to the mean of
all cases. Obviously, this is not a model that we are likely to seriously entertain in the
great majority of circumstances. However, it is quite useful as a baseline model, against
which to compare models one step up in complexity.
Moving up one level, we have models in which the dependent variable is presumed to be a
function of a single predictor variable. If the predictor is quantitative (sometimes
loosely called "continuous"), parameterization is quite simple: we simply add a
column containing the measured values on the quantitative predictor to the constant
column. However, if the predictor is categorical, things become more difficult, and
alternative approaches to parameterization are possible.
For example, if we have a single three level factor (A), there are a number of different
ways to represent the model. A representation that could be referred to as canonical or
basic, would be to add a parameter for each level of factor A, in addition to that for the
constant term. The design matrix, after deleting redundant rows (so that we have only one
row per cell or unique predicted value), would look as shown in Figure 1. We refer to this
design matrix as an overparameterized indicator design matrix.
Figure 1: Overparameterized Indicator Design Matrix
Level of A C A1 A2 A3
1 1 1 0 0
2 1 0 1 0
3 1 0 0 1
Here we have a 3x4 matrix, with one row for each possible value of A. There is one
parameter for each level of A, in addition to that for the constant. The problem we
confront here is that the final column, that for A3, can be expressed as a linear
combination of the preceding columns. Specifically,
A3 = C - A1 - A2 .
Because A3 can be expressed as a linear combination of preceding columns, it is
mathematically redundant. That is, estimation of the fourth parameter in the model cannot
add any information to that provided by estimation of the previous three parameters.
The GLM procedure handles computation of overparameterized models via a "sweep"
operator that produces a generalized inverse of X'X (where X is the original basis or
design matrix). The practical effect of this computational method is to alias redundant
model parameters to 0. That is, no parameter estimate is produced for a redundant column.
The non-aliased estimates produced for a one factor model are the same as those produced
in a linear regression model by using indicator or dummy coding, with the last category of
the factor representing the reference category. Unlike MANOVA, GLM does not use
specifications of contrast types from the user to determine what parameters are to be
estimated (contrast results are available in GLM, but they are produced after the initial
parameter estimation phase; in MANOVA, the design matrix is actually built from the
contrasts specified by the user or by defaults). The use of a single set of basic or
canonical parameters allows GLM to approach problem situations in a more systematic
manner. It also allows for simpler application of the parameter estimates. Regardless of
the model, when reproducing predicted values each parameter estimate for a factor is
either used or not used (multiplied by 1 or 0, respectively). Recall from earlier articles
that the basis or design matrices used in MANOVA often include a variety of other
values, depending on the type of contrasts specified and the number of levels of the
factor(s) involved.
In the next installment, we'll begin to illustrate the differences between the two
approaches, and highlight the essential commonalities that are at the heart of what we
wish to discover when we use quantitative models.
FROM MANOVA TO GLM: MORE BASICS OF PARAMETERIZATION
David P. Nichols
Senior Support Statistician
SPSS, Inc.
From SPSS Keywords, Number 65, 1997
In the last issue, I began discussing the differences between the ways in which the MANOVA
and GLM procedures parameterize linear models with categorical or factor variables. I
noted that GLM always uses one particular kind of parameterization, referred to as an
overparameterized indicator design matrix, while MANOVA produces a design matrix based
specifically on what kind of contrasts the user specifies, or else on defaults. The
parameterization used in GLM was referred to as canonical or basic because it represents
the model in a somewhat more fundamental way than do the MANOVA parameterizations.
An important feature of such a model illustrated by the GLM design matrix is the fact that
there is no unique solution to the set of equations defined by this matrix. Another way to
say this is that the model is overparameterized. As noted last time, GLM
deals with this by using a generalized inverse produced via a symmetric sweep operator
applied to the X'X matrix. The solution produced for a particular problem by MANOVA will
depend on the contrast specifications. Figure 1 presents the design matrix used by GLM and
the default design matrix used by MANOVA for the case of a single three level factor A.
Figure 1: Design Matrices for GLM and MANOVA defaults
GLM Design Matrix MANOVA Design Matrix
Level of A C A1 A2 A3 C A1 A2
1 1 1 0 0 1 1 0
2 1 0 1 0 1 0 1
3 1 0 0 1 1 -1 -1
Perhaps the most obvious difference between the two matrices is the number of columns
in each. The GLM matrix reflects the overparameterized nature of the model by identifying
four parameters. The MANOVA matrix has been reduced to the number of nonredundant
parameters available for estimation (three: one for each group mean), via what is known as
_reparameterization_. That is, the MANOVA design matrix can be constructed from the more
general GLM design matrix by factoring the GLM matrix into the product of two matrices:
the MANOVA design or basis matrix, and a contrast matrix whose rows show the
interpretations of the parameter estimates produced by use of the MANOVA design matrix.
Another notable difference is the fact that the GLM design matrix contains only ones and
zeros, so that when computing the predicted value for a given case via the linear model
equation
^ ^
Y = XB
you would need only to identify which parameter estimates are used (have a 1 for that
row), and sum these. The design matrices from MANOVA will have numbers other than 0 and 1,
including negative numbers, and sometimes very complicated decimal values. This makes
reproducing predictions much easier in GLM than in MANOVA.
As noted above, with three groups, there are three means to estimate, and the basic model,
represented by the GLM design matrix, is overparameterized. GLM handles such situations by
aliasing redundant or linearly dependent parameters to 0. In this example, the parameter
A3 is fixed at 0, since the A3 column is equal to C-A1-A2. This method is sometimes
referred to as using set to 0 restrictions. MANOVA handles this by reparameterizing the
model, using a method that is sometimes referred to as using sum to 0 restrictions (note
that each A column in the design matrix sums to 0). MANOVA offers a variety of ways to
define the design or basis matrix, each one producing a different set of parameter
estimates and contrast coefficients. All of these methods produce K-1 contrasts for a K
level factor, and all imply design matrix columns that sum to 0 for each of these K-1
columns.
Since the A3 parameter in GLM is fixed to 0, and we know that the predicted value for a
case in group 3 is that group's mean for a one factor model, we know that the C parameter
in GLM must be estimating the mean of the third group. Since the predicted values for
cases in the other groups are given by C+A1 and C+A2, A1 and A2 must be estimating the
differences between the other two group means and the final one. As noted last issue,
these are the same interpretations resulting from a model using just two dummy variables
representing each of the first two groups.
With MANOVA's sum to 0 approach, the estimate for the constant column always produces the
unweighted average of the cell means in a single factor design. The particular contrast
coding used here (the default), produces estimates for A1 and A2 that are the deviations
of each of the first two groups from this unweighted grand mean. The deviation of the
final group from the mean is the negative of the sum of the other deviations. While all of
the contrast options in MANOVA produce the same type of constant or intercept term, the
estimates for the A terms will differ depending on what type of contrasts are specified.
The important point to note here is that while the parameters estimated by the two
procedures (or the different choices in MANOVA) are different, the underlying model, the
predicted values for a given case, and the summary measures of prediction error (such as
the sum of squared residuals) are always the same. With regard to parameters, while there
is no unique value resulting for the constant or intercept, nor for any of the A terms,
any contrast (linear combination whose coefficients sum to 0) among the A terms will have
an unique value, regardless of how the model is parameterized. The invariance of contrasts
among the A parameters is of such importance that it occupies a central place in
theoretical treatments of statistical estimation. That central concept will be the focus
of the next installment.
MY TESTS DON'T AGREE!
David P. Nichols
Principal Support Statistician and
Manager of Statistical Support
SPSS, Inc.
From SPSS Keywords, Number 66, 1998
A common question asked of SPSS Statistical Support is how to interpret a set of tests
that are testing the same or logically related null hypotheses, yet produce different
conclusions. The prime example of this would be a situation where an omnibus F-test in an
analysis of variance (ANOVA) produces a significance level (p-value) less than a critical
alpha (such as .05), but follow up tests comparing levels of the factor do not produce any
p-values less than alpha, or conversely, where the omnibus F-test is not significant at
the given alpha level, while one or more pairwise comparisons are significant.
For an example, consider a one way ANOVA model of a very simple form: three groups, equal
sample sizes, with standard independence, normality, and homogeneity of variance
assumptions met. Further, assume that our numbers have been measured without error. The
null hypothesis tested by the omnibus F-test is that all three population means are equal:
mu(1) = mu(2) = mu(3).
The null hypothesis tested by a pairwise comparison of groups i and j is that these two
population means are equal:
mu(i) = mu(j).
The hypotheses tested by the omnibus test and pairwise comparisons are thus logically
related: the omnibus null hypothesis is the union of all pairwise null hypotheses. That
is, all population means can only be equal if any two chosen from the set are equal.
However, as many readers may know, the results of significance tests using sample data
frequently produce logical contradictions. How can this be?
The reason that such contradictory results can occur is that when we are making inferences
about population parameters (such as population means) using sample data, our estimates
are subject to sampling error. Were we dealing with the entireties of finite populations,
we could simply compute the mean or other parameter(s) of interest in each population, and
compare the results. There would be no sampling error, and hence no need for measures of
precision of estimation, such as standard errors. Our decisions with regard to the above
stated null hypotheses would then be logically consistent: the numbers would all be equal,
or else some would differ from others.
Since we generally do not have the luxury of addressing problems where we can identify
entire finite populations and measure all values, we are forced to work with samples and
to make inferences about the unknown population values. The mean or other parameter values
that we compute are estimates of the true unknown values, and these estimates are subject
to sampling error. Thus, the means computed from several random samples from populations
with the same mean will not generally be equal. We are not able to specify what the value
of a sample mean will be even if we know the population value. What we can specify is the
_distribution_ of sample means and various related statistics under such circumstances.
Thus, the logic behind the standard F-test in an ANOVA is that if all of the assumptions
are met, the distribution of F-values in repeated samples will follow the theoretical
central F distribution with appropriate degrees of freedom if the null hypothesis is
true. The logic behind the pairwise comparison tests is identical: if the model
assumptions are met and the two population means of interest are equal, the t or F
statistics produced by repeated sampling will follow the appropriate theoretical central t
or F distributions. The important point is that the methodology of statistical inference
does not allow us to state what will happen in a particular case, only the distributions
of results in repeated random samples. It thus does not preclude the possibility of
logically contradictory results. This state of affairs, while disconcerting to many, is
simply part of the price we pay when we seek to make inferences based on samples.
In the case of a significant omnibus F-statistic and nonsignificant pairwise comparisons,
some people have proposed the explanation that while no two means are different, some more
complicated contrast among the means is nonzero, leading to the significant omnibus F.
Such an explanation mistakes the mechanics of the methodology of the F-statistic for the
hypothesis being tested. That is, while the F-statistic can be constructed as a function
of the maximal single degree of freedom contrast computable from the sample data, the
hypothesis tested is still that the population means are all equal, and the contrast value
can only be nonzero in the population if at least one population mean is different from
the others.
To broaden the discussion a bit and reinforce the point, consider a simple two way
crosstabulation or contingency table. The two most popular test statistics for testing the
null hypothesis of no population association between rows and columns are the Pearson and
the Likelihood Ratio (LR) chi-squared tests. These statistics are testing the same null
hypothesis and follow the same theoretical distribution under that null hypothesis, but
they will sometimes yield different conclusions for a set of sample data. Again, the
reason is that sampling variability means that we can only know about the distributions of
the test statistics, not what they will be in particular cases.
What can we do about this? We cannot generally avoid the problem, as we are not usually in
a position to identify finite populations of interest and measure all members of these
populations. The best we can do is to
understand the true nature of the problem and accept its implications. One is that the
problem will always be with us in standard situations. The other is that we can minimize
it by using larger samples, which provide us with greater levels of precision, and reduce
the probability of seeing such results. As sample sizes increase to infinity, sampling
errors converge to 0. Though we cannot achieve infinite sample sizes, the larger our
samples, all other things being equal, the firmer our results.
This page was adapted from a web page at the SPSS web page. We thank SPSS for their permission to adapt and distribute this page via our web site.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services