SPSS Data Analysis Examples
Canonical Correlation Analysis

Version info: Code for this page was tested in IBM SPSS 20.

Canonical correlation analysis is used to identify and measure the associations among two sets of variables.  Canonical correlation is appropriate in the same situations where multiple regression would be, but where are there are multiple intercorrelated outcome variables. Canonical correlation analysis determines a set of canonical variates, orthogonal linear combinations of the variables within each set that best explain the variability both within and between sets. 

Please Note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do. In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics and potential follow-up analyses.

Examples of canonical correlation analysis

Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.

Example 2. A researcher is interested in exploring associations among factors from two multidimensional personality tests, the MMPI and the NEO. She is interested in what dimensions are common between the tests and how much variance is shared between them. She is specifically interested in finding whether the neuroticism dimension from the NEO can account for a substantial amount of shared variance between the two tests.. 

Description of the data

Let's pursue Example 1 from above.

We have included the data file, which can be obtained by clicking on mmreg.sav. The dataset has 600 observations on eight variables. The psychological variables are locus of control, self-concept and motivation. The academic variables are standardized tests in reading, writing, math and science. Additionally, the variable female is a zero-one indicator variable with the one indicating a female student.

Let's look at the data.

Here are the correlations among the variables in the analysis.

Analysis methods you might consider

Below is a list of some analysis methods you may have encountered. Some of the methods listed are quite reasonable while others have either fallen out of favor or have limitations.

Canonical correlation analysis

SPSS performs canonical correlation using the manova command. Don't look for manova in the point-and-click analysis menu, its not there. The manova command is one of SPSS's hidden gems that is often overlooked. Used with the discrim option, manova will compute the canonical correlation analysis.

Due to the length of the output, we will be making comments in several places along the way.

The number of possible canonical variates, also known as canonical dimensions,  is equal to the number of variables in the smaller set (the variables to the left of "WITH" in this example, called "DEPENDENT variables" in SPSS output). In our example, the first set  has three variables and the second set has five (called "COVARIATES" in SPSS output).  This leads to three possible canonical variates for each set, which corresponds to the three columns for each set and three canonical correlation coefficients in the output.  Canonical dimensions are latent variables that are analogous to factors obtained in factor analysis, except that canonical variates also maximize the correlation between the two sets of variables. In general, not all the canonical dimensions will be statistically significant. A significant dimension corresponds to a significant canonical correlation and vice versa.

The output below begins with an overall multivariate test of the entire model using four different multivariate criteria. This is followed by the three canonical correlations and the multivariate tests of each of the dimensions. These results show that the first two of the three canonical correlations are statistically significant at the .05 level.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The default error term in MANOVA has been changed from WITHIN CELLS to
WITHIN+RESIDUAL.  Note that these are the same for all full factorial designs.



* * * * * * * * * * * * * * * * * A n a l y s i s   o f   V a r i a n c e * * * * * * * * * * * * * * * * *


       600 cases accepted.
         0 cases rejected because of out-of-range factor values.
         0 cases rejected because of missing data.
         1 non-empty cell.

         1 design will be processed.

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -



* * * * * * * * * * * * * * * * * A n a l y s i s   o f   V a r i a n c e -- Design   1 * * * * * * * * * * * * * * * * *

 EFFECT .. WITHIN CELLS Regression
 Multivariate Tests of Significance (S = 3, M = 1/2, N = 295 )

 Test Name             Value        Approx. F       Hypoth. DF         Error DF        Sig. of F

 Pillais                .25425         11.00057            15.00          1782.00             .000
 Hotellings             .31430         12.37633            15.00          1772.00             .000
 Wilks                  .75436         11.71573            15.00          1634.65             .000
 Roys                   .21538

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Eigenvalues and Canonical Correlations

 Root No.       Eigenvalue           Pct.      Cum. Pct.     Canon Cor.        Sq. Cor

        1           .27450       87.33628       87.33628         .46409         .21538
        2           .02887        9.18537       96.52164         .16751         .02806
        3           .01093        3.47836      100.00000         .10399         .01081

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dimension Reduction Analysis

 Roots              Wilks L.                F       Hypoth. DF         Error DF        Sig. of F

 1 TO 3               .75436         11.71573            15.00          1634.65             .000
 2 TO 3               .96143          2.94446             8.00          1186.00             .003
 3 TO 3               .98919          2.16461             3.00           594.00             .091

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 

Here we have the overall multivariate tests for dimensionality.

We also have the canonical correlations as well how much variance of the dependent variables is explained by the dimensions. For this particular model there are three canonical dimensions of which only the first two are statistically significant. The first test of dimensions tests whether all three dimensions combined are significant (they are), the next test tests whether dimensions 2 and 3 combined are significant (they are). Finally, the last test tests whether dimension 3, by itself, is significant (it is not). Therefore dimensions 1 and 2 must each be significant.

 EFFECT .. WITHIN CELLS Regression (Cont.)
 Univariate F-tests with (5,594) D. F.

 Variable       Sq. Mul. R     Adj. R-sq.     Hypoth. MS       Error MS              F      Sig. of F

 locus_of           .18062         .17372        9.72160         .37123       26.18789           .000
 self_con           .01957         .01131        1.16669         .49212        2.37076           .038
 motivati           .07874         .07098        1.10799         .10913       10.15338           .000

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Raw canonical coefficients for DEPENDENT variables
           Function No.

 Variable                  1                2                3

 locus_of            1.25383          -.62148           .66169
 self_con            -.35135         -1.18769          -.82672
 motivati            1.26242          2.02726         -2.00023

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Standardized canonical coefficients for DEPENDENT variables
           Function No.

 Variable                  1                2                3

 locus_of             .84042          -.41656           .44352
 self_con            -.24788          -.83793          -.58326
 motivati             .43267           .69480          -.68554

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Correlations between DEPENDENT and canonical variables
           Function No.

 Variable                  1                2                3

 locus_of             .90405          -.38969           .17562
 self_con             .02084          -.70874          -.70516
 motivati             .56715           .35089          -.74513

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Variance in dependent variables explained by canonical variables

 CAN. VAR.       Pct Var DEP      Cum Pct DEP      Pct Var COV      Cum Pct COV

        1           37.97982         37.97982          8.17994          8.17994
        2           25.90966         63.88948           .72701          8.90694
        3           36.11052        100.00000           .39050          9.29745

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Raw canonical coefficients for COVARIATES
           Function No.

 COVARIATE                 1                2                3

 read                 .04462          -.00491          -.02138
 write                .03588           .04207          -.09131
 math                 .02342           .00423          -.00940
 science              .00503          -.08516           .10984
 female               .63212          1.08464          1.79465

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Standardized canonical coefficients for COVARIATES
           CAN. VAR.

 COVARIATE                 1                2                3

 read                 .45080          -.04961          -.21601
 write                .34896           .40921          -.88810
 math                 .22047           .03982          -.08848
 science              .04878          -.82660          1.06608
 female               .31504           .54057           .89443

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Correlations between COVARIATES and canonical variables
           CAN. VAR.

 Covariate                 1                2                3

 read                 .84045          -.35883          -.13536
 write                .87654           .06484          -.25456
 math                 .76395          -.29795          -.14776
 science              .65841          -.67680           .23036
 female               .36411           .75493           .54340

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Variance in covariates explained by canonical variables

 CAN. VAR.       Pct Var DEP      Cum Pct DEP      Pct Var COV      Cum Pct COV

        1           11.30458         11.30458         52.48769         52.48769
        2             .70132         12.00590         24.99409         77.48177
        3             .09804         12.10394          9.06617         86.54795

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The raw canonical coefficients above are used to generate the canonical variates, represented by the columns (1 2 3) in the coefficient tables, for each set. They are interpreted in a manner analogous to interpreting regression coefficients i.e., for the variable read, a one unit increase in reading leads to a .0446 increase in the first canonical variate of the COVARIATE set when all of the other variables are held constant. Here is another example: being female leads to a .6321 increase in the dimension 1 for the  COVARIATE set with the other predictors held constant.

The raw canonical coefficients are interpreted in a manner analogous to interpreting regression coefficients i.e., for the variable read, a one unit increase in reading leads to a .0446 increase in the first canonical variate of set 2 when all of the other variables are held constant. Here is another example: being female leads to a .6321 increase in the dimension 1 for set 2 with the other predictors held constant. When the variables in the model have very different standard deviations, the standardized coefficients allow for easier comparisons among the variables.

The raw canonical coefficients are followed by the standardized canonical coefficients. The standardized canonical coefficients are interpreted in a manner analogous to interpreting standardized regression coefficients. For example, consider the variable read, a one standard deviation increase in reading leads to a 0.45 standard deviation increase in the score on the first canonical variate for the COVARIATE set when the other variables in the model are held constant.

Things to consider

See also

References

 

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.