|
|
|
||||
|
|
|||||
Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.
We have included the data file, which can be obtained by clicking on mmreg.sas7bdat. The dataset has 600 observations on eight variables. The psychological variables are locus of control, self-concept and motivation. The academic variables are standardized tests in reading, writing, math and science. Additionally, the variable female is a zero-one indicator variable with the one indicating a female student.
Let's look at the data.
options nocenter;
proc means data="c:\data\mmreg";
run;
The MEANS Procedure
Variable Label N Mean Std Dev Minimum Maximum
ID 600 300.5000000 173.3493582 1.0000000 600.0000000
locus_of_control locus of control 600 0.0965333 0.6702799 -2.2300000 1.3600000
self_concept self-concept 600 0.0049167 0.7055125 -2.6199999 1.1900001
motivation motivation 600 0.6608333 0.3427294 0 1.0000000
read reading score 600 51.9018334 10.1029830 28.2999992 76.0000000
write writing score 600 52.3848333 9.7264550 25.5000000 67.0999985
math math score 600 51.8490000 9.4147363 31.7999992 75.5000000
science science score 600 51.7633332 9.7061789 26.0000000 74.1999969
female 600 0.5450000 0.4983864 0 1.0000000
proc freq data="c:\data\mmreg";
table female;
run;
The FREQ Procedure
Cumulative Cumulative
FEMALE Frequency Percent Frequency Percent
------------------------------------------------------------
0 273 45.50 273 45.50
1 327 54.50 600 100.00
We did not include correlations among the variables at this point because we will get them later as part of the canonical correlation analysis.
Due to the length of the output, we will be making comments in several places along the way.
proc cancorr corr data="c:\data\mmreg"; var locus_of_control self_concept motivation; with read write math science female; run;
The corr option on the proc cancorr statement produces correlations within and between the two sets of variables are given below.
The CANCORR Procedure
Correlations Among the Original Variables
Correlations Among the VAR Variables
LOCUS_OF_
CONTROL SELF_CONCEPT MOTIVATION
LOCUS_OF_CONTROL 1.0000 0.1712 0.2451
SELF_CONCEPT 0.1712 1.0000 0.2886
MOTIVATION 0.2451 0.2886 1.0000
Correlations Among the WITH Variables
READ WRITE MATH SCIENCE FEMALE
READ 1.0000 0.6286 0.6793 0.6907 -0.0417
WRITE 0.6286 1.0000 0.6327 0.5691 0.2443
MATH 0.6793 0.6327 1.0000 0.6495 -0.0482
SCIENCE 0.6907 0.5691 0.6495 1.0000 -0.1382
FEMALE -0.0417 0.2443 -0.0482 -0.1382 1.0000
Correlations Between the VAR Variables and the WITH Variables
READ WRITE MATH SCIENCE FEMALE
LOCUS_OF_CONTROL 0.3736 0.3589 0.3373 0.3246 0.1134
SELF_CONCEPT 0.0607 0.0194 0.0536 0.0698 -0.1260
MOTIVATION 0.2106 0.2542 0.1950 0.1157 0.0981
The output below gives the three canonical correlations and the multivariate tests of the dimensions. These results show that the first two of the three canonical correlations are statistically significant. The output also includes the four multivariate criteria and the F approximations.
Canonical Correlation Analysis
Adjusted Approximate Squared
Canonical Canonical Standard Canonical
Correlation Correlation Error Correlation
1 0.464086 0.455474 0.032059 0.215376
2 0.167509 . 0.039712 0.028059
3 0.103991 . 0.040417 0.010814
Test of H0: The canonical correlations in the
Eigenvalues of Inv(E)*H current row and all that follow are zero
= CanRsq/(1-CanRsq)
Likelihood Approximate
Eigenvalue Difference Proportion Cumulative Ratio F Value Num DF Den DF Pr > F
1 0.2745 0.2456 0.8734 0.8734 0.75436113 11.72 15 1634.7 <.0001
2 0.0289 0.0179 0.0919 0.9652 0.96142996 2.94 8 1186 0.0029
3 0.0109 0.0348 1.0000 0.98918584 2.16 3 594 0.0911
Multivariate Statistics and F Approximations
S=3 M=0.5 N=295
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.75436113 11.72 15 1634.7 <.0001
Pillai's Trace 0.25424936 11.00 15 1782 <.0001
Hotelling-Lawley Trace 0.31429738 12.38 15 1113 <.0001
Roy's Greatest Root 0.27449563 32.61 5 594 <.0001
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
In general, the number of canonical dimensions is equal to the number of variables in the smaller set; however, the number of significant dimensions may be even smaller. Canonical dimensions, also known as canonical variates, are latent variables that are analogous to factors obtained in factor analysis. For this particular model there are three canonical dimensions of which only the first two are statistically significant. The first test of dimensions tests whether all three dimensions are significant (F = 11.72), the next test tests whether dimensions 2 and 3 combined are significant (F = 2.94). Finally, the last test tests whether dimension 3, by itself, is significant (F = 2.16). Therefore dimensions 1 and 2 are each significant while the third dimension is not.
Next, the raw canonical coefficients are shown below. When the variables in the model have very different standard deviations, the standardized coefficients allow for easier comparisons among the variables.
The raw canonical coefficients are interpreted in a manner analogous to interpreting regression coefficients i.e., for the variable read, a one unit increase in reading leads to a .0446 increase in the first canonical variate of set 2 when all of the other variables are held constant. Here is another example: being female leads to a .6321 increase in the dimension 1 for set 2 with the other predictors held constant.
Raw Canonical Coefficients for the VAR Variables
V1 V2 V3
LOCUS_OF_CONTROL locus of control 1.2538339076 0.6214775237 -0.661689607
SELF_CONCEPT self-concept -0.35134993 1.1876866562 0.8267209411
MOTIVATION motivation 1.2624203286 -2.027264053 2.0002284379
Raw Canonical Coefficients for the WITH Variables
W1 W2 W3
READ reading score 0.0446205959 0.0049100176 0.0213805581
WRITE writing score 0.0358771125 -0.042071471 0.0913073288
MATH math score 0.0234171847 -0.004229472 0.0093982096
SCIENCE science score 0.0050251567 0.0851621751 -0.109835018
FEMALE 0.6321192387 -1.084642482 -1.794646917
The raw coefficients are followed by the standardized canonical coefficients shown below. The standardized canonical coefficients are interpreted in a manner analogous to interpreting standardized regression coefficients. For example, consider the variable read, a one standard deviation increase in reading leads to a 0.45 standard deviation increase in the score on the first canonical variate for set 2 when the other variables in the model are held constant.
Standardized Canonical Coefficients for the VAR Variables
V1 V2 V3
LOCUS_OF_CONTROL locus of control 0.8404 0.4166 -0.4435
SELF_CONCEPT self-concept -0.2479 0.8379 0.5833
MOTIVATION motivation 0.4327 -0.6948 0.6855
Standardized Canonical Coefficients for the WITH Variables
W1 W2 W3
READ reading score 0.4508 0.0496 0.2160
WRITE writing score 0.3490 -0.4092 0.8881
MATH math score 0.2205 -0.0398 0.0885
SCIENCE science score 0.0488 0.8266 -1.0661
FEMALE 0.3150 -0.5406 -0.8944
Below are correlations between observed variables and canonical variables which are known as the canonical loadings, which SAS labels as the canonical structure.
Canonical Structure
Correlations Between the VAR Variables and Their Canonical Variables
V1 V2 V3
LOCUS_OF_CONTROL locus of control 0.9040 0.3897 -0.1756
SELF_CONCEPT self-concept 0.0208 0.7087 0.7052
MOTIVATION motivation 0.5672 -0.3509 0.7451
Correlations Between the WITH Variables and Their Canonical Variables
W1 W2 W3
READ reading score 0.8404 0.3588 0.1354
WRITE writing score 0.8765 -0.0648 0.2546
MATH math score 0.7639 0.2979 0.1478
SCIENCE science score 0.6584 0.6768 -0.2304
FEMALE 0.3641 -0.7549 -0.5434
Correlations Between the VAR Variables and the Canonical Variables of the WITH Variables
W1 W2 W3
LOCUS_OF_CONTROL locus of control 0.4196 0.0653 -0.0183
SELF_CONCEPT self-concept 0.0097 0.1187 0.0733
MOTIVATION motivation 0.2632 -0.0588 0.0775
Correlations Between the WITH Variables and the Canonical Variables of the VAR Variables
V1 V2 V3
READ reading score 0.3900 0.0601 0.0141
WRITE writing score 0.4068 -0.0109 0.0265
MATH math score 0.3545 0.0499 0.0154
SCIENCE science score 0.3056 0.1134 -0.0240
FEMALE 0.1690 -0.1265 -0.0565
Table 1: Tests of Canonical Dimensions
Canonical Mult.
Dimension Corr. F df1 df2 p
1 0.46 11.72 15 1634.7 0.0001
2 0.17 2.94 8 1186 0.0029
3 0.10 2.16 3 594 0.0911
Table 2: Standardized Canonical Coefficients
Dimension
1 2
Psychological Variables
locus of control 0.84 0.42
self-concept -0.25 0.84
motivation 0.43 -0.69
Academic Variables plus Gender
reading 0.45 0.05
writing 0.35 -0.41
math 0.22 -0.04
science 0.05 0.83
gender (female=1) 0.32 -0.54
Tests of dimensionality for the canonical correlation analysis, as shown in Table 1, indicate that two of the three canonical dimensions are statistically significant at the .05 level. Dimension 1 had a canonical correlation of 0.46 between the sets of variables, while for dimension 2 the canonical correlation was much lower at 0.17.
Table 2 presents the standardized canonical coefficients for the first two dimensions across both sets of variables. For the psychological variables, the first canonical dimension is most strongly influenced by locus of control (.84) and for the second dimension self-concept (-.84) and motivation (.69). For the academic variables plus gender, the first dimension was comprised of reading (.45), writing (.35) and gender (.32). For the second dimension writing (.41), science (-.83) and gender (.54) were the dominating variables.UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services