UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Data Analysis Examples
Canonical Correlation Analysis

Examples of Canonical Correlation Analysis

Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.

Description of the Data

Let's pursue Example 1 from above.

We have included the data file, which can be obtained by clicking on mmreg.sas7bdat. The dataset has 600 observations on eight variables. The psychological variables are locus of control, self-concept and motivation. The academic variables are standardized tests in reading, writing, math and science. Additionally, the variable female is a zero-one indicator variable with the one indicating a female student.

Let's look at the data.

We did not include correlations among the variables at this point because we will get them later as part of the canonical correlation analysis.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a canonical correlation analysis, let's consider some other methods that you might use.

SAS Canonical Correlation Analysis

Due to the length of the output, we will be making comments in several places along the way.

The corr option on the proc cancorr statement produces correlations within and between the two sets of variables are given below.

The output below gives the three canonical correlations and the multivariate tests of the dimensions. These results show that the first two of the three canonical correlations are statistically significant. The output also includes the four multivariate criteria and the F approximations.

In general, the number of canonical dimensions is equal to the number of variables in the smaller set; however, the number of significant dimensions may be even smaller. Canonical dimensions, also known as canonical variates, are latent variables that are analogous to factors obtained in factor analysis. For this particular model there are three canonical dimensions of which only the first two are statistically significant. The first test of dimensions tests whether all three dimensions are significant (F = 11.72), the next test tests whether dimensions 2 and 3 combined are significant (F = 2.94). Finally, the last test tests whether dimension 3, by itself, is significant (F = 2.16). Therefore dimensions 1 and 2 are each significant while the third dimension is not.

Next, the raw canonical coefficients are shown below. When the variables in the model have very different standard deviations, the standardized coefficients allow for easier comparisons among the variables.

The raw canonical coefficients are interpreted in a manner analogous to interpreting regression coefficients i.e., for the variable read, a one unit increase in reading leads to a .0446 increase in the first canonical variate of set 2 when all of the other variables are held constant. Here is another example: being female leads to a .6321 increase in the dimension 1 for set 2 with the other predictors held constant.

The raw coefficients are followed by the standardized canonical coefficients shown below. The standardized canonical coefficients are interpreted in a manner analogous to interpreting standardized regression coefficients. For example, consider the variable read, a one standard deviation increase in reading leads to a 0.45 standard deviation increase in the score on the first canonical variate for set 2 when the other variables in the model are held constant.

Below are correlations between observed variables and canonical variables which are known as the canonical loadings, which SAS labels as the canonical structure.

Sample Write-Up of the Analysis

There is a lot of variation in the write-ups of canonical correlation analyses. The write-up below is fairly minimal, including only the tests of dimensionality and the standardized coefficients.

Tests of dimensionality for the canonical correlation analysis, as shown in Table 1, indicate that two of the three canonical dimensions are statistically significant at the .05 level. Dimension 1 had a canonical correlation of 0.46 between the sets of variables, while for dimension 2 the canonical correlation was much lower at 0.17.

Table 2 presents the standardized canonical coefficients for the first two dimensions across both sets of variables. For the psychological variables, the first canonical dimension is most strongly influenced by locus of control (.84) and for the second dimension self-concept (-.84) and motivation (.69). For the academic variables plus gender, the first dimension was comprised of reading (.45), writing (.35) and gender (.32). For the second dimension writing (.41), science (-.83) and gender (.54) were the dominating variables.

Cautions, Flies in the Ointment

  • Multivariate normal distribution assumptions are required for both sets of variables.
  • Canonical correlation analysis is not recommended for small samples.
  • See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California