Help the Stat Consulting Group by giving a gift

Canonical Correlation Analysis

**Version info**: Code for this page was tested in IBM SPSS 20.

**Please Note:** The purpose of this page is to show how to use various data analysis commands.
It does not cover all aspects of the research process which researchers are expected to do. In
particular, it does not cover data cleaning and checking, verification of assumptions, model
diagnostics and potential follow-up analyses.

Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.

Example 2. A researcher is interested in exploring associations among factors from two multidimensional personality tests, the MMPI and the NEO. She is interested in what dimensions are common between the tests and how much variance is shared between them. She is specifically interested in finding whether the neuroticism dimension from the NEO can account for a substantial amount of shared variance between the two tests..

We have included the data file, which can be obtained by clicking on
mmreg.sav.
The dataset has 600 observations on eight variables.
The psychological variables are **locus of control**, **self-concept** and
**motivation**. The academic variables are standardized tests in
**reading**, **writing**, **math** and **science**. Additionally,
the variable **female** is a zero-one indicator variable
with the one indicating a female student.

Let's look at the data.

get file='d:\data\mmreg.sav'. descriptives variables=locus_of_control self_concept motivation read write math science female /statistics=mean stddev min max.frequencies variables=female .

Here are the correlations among the variables in the analysis.

correlations /variables=locus_of_control self_concept motivation read write math science female.

Below is a list of some analysis methods you may have encountered. Some of the methods listed are quite reasonable while others have either fallen out of favor or have limitations.

- Canonical correlation analysis, the focus of this page.
- Separate OLS Regressions - You could analyze these data using separate OLS regression analyses for each variable in one set. The OLS regressions will not produce multivariate results and does not report information concerning dimensionality.
- Multivariate multiple regression is a reasonable option if you have no interest in dimensionality.

SPSS performs canonical correlation using the **manova** command. Don't look for
**manova** in the point-and-click analysis menu, its not there. The **manova** command
is one of SPSS's hidden gems that is often overlooked. Used with the **discrim** option,
manova will compute the canonical correlation analysis.

Due to the length of the output, we will be making comments in several places along the way.

manova locus_of_control self_concept motivation WITH read write math science female / discrim all alpha(1) / print=sig(eigen dim) .

The number of possible canonical variates, also known as canonical dimensions, is equal to the number of variables in the smaller set (the variables to the left of "WITH" in this example, called "DEPENDENT variables" in SPSS output). In our example, the first set has three variables and the second set has five (called "COVARIATES" in SPSS output). This leads to three possible canonical variates for each set, which corresponds to the three columns for each set and three canonical correlation coefficients in the output. Canonical dimensions are latent variables that are analogous to factors obtained in factor analysis, except that canonical variates also maximize the correlation between the two sets of variables. In general, not all the canonical dimensions will be statistically significant. A significant dimension corresponds to a significant canonical correlation and vice versa.

The output below begins with an overall multivariate test of the entire model using four different multivariate criteria. This is followed by the three canonical correlations and the multivariate tests of each of the dimensions. These results show that the first two of the three canonical correlations are statistically significant at the .05 level.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - The default error term in MANOVA has been changed from WITHIN CELLS to WITHIN+RESIDUAL. Note that these are the same for all full factorial designs. * * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e * * * * * * * * * * * * * * * * * 600 cases accepted. 0 cases rejected because of out-of-range factor values. 0 cases rejected because of missing data. 1 non-empty cell. 1 design will be processed. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - * * * * * * * * * * * * * * * * * A n a l y s i s o f V a r i a n c e -- Design 1 * * * * * * * * * * * * * * * * * EFFECT .. WITHIN CELLS Regression Multivariate Tests of Significance (S = 3, M = 1/2, N = 295 ) Test Name Value Approx. F Hypoth. DF Error DF Sig. of F Pillais .25425 11.00057 15.00 1782.00 .000 Hotellings .31430 12.37633 15.00 1772.00 .000 Wilks .75436 11.71573 15.00 1634.65 .000 Roys .21538 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Eigenvalues and Canonical Correlations Root No. Eigenvalue Pct. Cum. Pct. Canon Cor. Sq. Cor 1 .27450 87.33628 87.33628 .46409 .21538 2 .02887 9.18537 96.52164 .16751 .02806 3 .01093 3.47836 100.00000 .10399 .01081 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Dimension Reduction Analysis Roots Wilks L. F Hypoth. DF Error DF Sig. of F 1 TO 3 .75436 11.71573 15.00 1634.65 .000 2 TO 3 .96143 2.94446 8.00 1186.00 .003 3 TO 3 .98919 2.16461 3.00 594.00 .091 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Here we have the overall multivariate tests for dimensionality.

We also have the canonical correlations as well how much variance of the dependent variables is explained by the dimensions. For this particular model there are three canonical dimensions of which only the first two are statistically significant. The first test of dimensions tests whether all three dimensions combined are significant (they are), the next test tests whether dimensions 2 and 3 combined are significant (they are). Finally, the last test tests whether dimension 3, by itself, is significant (it is not). Therefore dimensions 1 and 2 must each be significant.

EFFECT .. WITHIN CELLS Regression (Cont.) Univariate F-tests with (5,594) D. F. Variable Sq. Mul. R Adj. R-sq. Hypoth. MS Error MS F Sig. of F locus_of .18062 .17372 9.72160 .37123 26.18789 .000 self_con .01957 .01131 1.16669 .49212 2.37076 .038 motivati .07874 .07098 1.10799 .10913 10.15338 .000 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Raw canonical coefficients for DEPENDENT variables Function No. Variable 1 2 3 locus_of 1.25383 -.62148 .66169 self_con -.35135 -1.18769 -.82672 motivati 1.26242 2.02726 -2.00023 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Standardized canonical coefficients for DEPENDENT variables Function No. Variable 1 2 3 locus_of .84042 -.41656 .44352 self_con -.24788 -.83793 -.58326 motivati .43267 .69480 -.68554 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Correlations between DEPENDENT and canonical variables Function No. Variable 1 2 3 locus_of .90405 -.38969 .17562 self_con .02084 -.70874 -.70516 motivati .56715 .35089 -.74513 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Variance in dependent variables explained by canonical variables CAN. VAR. Pct Var DEP Cum Pct DEP Pct Var COV Cum Pct COV 1 37.97982 37.97982 8.17994 8.17994 2 25.90966 63.88948 .72701 8.90694 3 36.11052 100.00000 .39050 9.29745 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Raw canonical coefficients for COVARIATES Function No. COVARIATE 1 2 3 read .04462 -.00491 -.02138 write .03588 .04207 -.09131 math .02342 .00423 -.00940 science .00503 -.08516 .10984 female .63212 1.08464 1.79465 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Standardized canonical coefficients for COVARIATES CAN. VAR. COVARIATE 1 2 3 read .45080 -.04961 -.21601 write .34896 .40921 -.88810 math .22047 .03982 -.08848 science .04878 -.82660 1.06608 female .31504 .54057 .89443 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Correlations between COVARIATES and canonical variables CAN. VAR. Covariate 1 2 3 read .84045 -.35883 -.13536 write .87654 .06484 -.25456 math .76395 -.29795 -.14776 science .65841 -.67680 .23036 female .36411 .75493 .54340 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Variance in covariates explained by canonical variables CAN. VAR. Pct Var DEP Cum Pct DEP Pct Var COV Cum Pct COV 1 11.30458 11.30458 52.48769 52.48769 2 .70132 12.00590 24.99409 77.48177 3 .09804 12.10394 9.06617 86.54795 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The raw canonical coefficients above are used to generate the canonical variates,
represented by the columns (1 2 3) in the coefficient tables,
for each set. They are interpreted in a manner analogous to interpreting
regression coefficients i.e., for the variable **read**, a one unit increase in reading leads to a
.0446 increase in the first canonical variate of the COVARIATE set when all of the
other variables are held constant. Here is another example: being female leads
to a .6321 increase in the dimension 1 for the COVARIATE set with the other predictors held constant.

The raw canonical coefficients are interpreted in a manner analogous to interpreting
regression coefficients i.e., for the variable **read**, a one unit increase in reading leads to a
.0446 increase in the first canonical variate of set 2 when all of
the other variables are held constant. Here is another example: being female leads to
a .6321 increase in the dimension 1 for set 2 with the other predictors held constant.
When the variables in the model have very different standard deviations,
the standardized coefficients allow for easier comparisons among the variables.

The raw canonical coefficients are followed by the standardized canonical coefficients. The standardized canonical coefficients are interpreted in a manner analogous to
interpreting standardized regression coefficients. For example, consider the
variable **read**, a one
standard deviation increase in reading leads to a 0.45 standard deviation increase in the
score on the first canonical variate for the COVARIATE set when the other variables in the model are
held constant.

- As in the case of multivariate regression, MANOVA and so on, for valid inference, canonical correlation analysis requires the multivariate normal and homogeneity of variance assumption.
- Canonical correlation analysis assumes a linear relationship between the canonical variates and each set of variables.
- Similar to multivariate regression, canonical correlation analysis requires a large sample size.

- SPSS Syntax Guide
**manova**

- Afifi, A, Clark, V and May, S. 2004.
*Computer-Aided Multivariate Analysis.*4th ed. Boca Raton, Fl: Chapman & Hall/CRC. - G. David Garson, Canonical Correlation in Statnotes: Topics in
Multivariate Analysis

- Pedhazur, E. 1997.
*Multiple Regression in Behavioral Research*. 3rd ed. Orlando, Fl: Holt, Rinehart and Winston, Inc.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.