|
|
|
||||
|
|
|||||
Note: This page developed with R 2.5.0.
Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables (set 1) relates to the academic variables and gender (set 2). In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.
We have a data file, mmreg.csv, with 600 observations on eight variables. The psychological variables are locus of control, self-concept and motivation. The academic variables are standardized tests in reading, writing, math and science. Additionally, the variable female is a zero-one indicator variable with the one indicating a female student.
Note: In analyzing these data using R you may have to install the following packages: CCA, fda, zoo, fields and catspec. All of these analyses were done using R 2.5.0.
Let's look at the data.mm<-read.table("~/data/Rdata/mmreg.csv", sep=",", header = TRUE)
attach(mm)
library(fields)
t(stats(mm))
N mean Std.Dev. min Q1 median Q3 max missing values
locus_of_control 600 0.096533333 0.6702799 -2.23 -0.3725 0.21 0.510 1.36 0
self_concept 600 0.004916667 0.7055125 -2.62 -0.3000 0.03 0.440 1.19 0
motivation 600 0.660833333 0.3427294 0.00 0.3300 0.67 1.000 1.00 0
read 600 51.901833333 10.1029829 28.30 44.2000 52.10 60.100 76.00 0
write 600 52.384833333 9.7264550 25.50 44.3000 54.10 59.900 67.10 0
math 600 51.849000000 9.4147364 31.80 44.5000 51.30 58.375 75.50 0
science 600 51.763333333 9.7061787 26.00 44.4000 52.60 58.650 74.20 0
female 600 0.545000000 0.4983864 0.00 0.0000 1.00 1.000 1.00 0
library(catspec)
ctab(table(female), addmargins=TRUE)
Count Total %
female
0 273.0 45.5
1 327.0 54.5
Sum 600.0 100.0
Next, we'll look at the correlations within and between the two sets of variables using the matcor function from the CCA package.
# define the two sets of variables
psych<-mm[,1:3]
acad<-mm[,4:8]
# correlations
library(CCA)
matcor(psych,acad)
$Xcor
locus_of_control self_concept motivation
locus_of_control 1.0000000 0.1711878 0.2451323
self_concept 0.1711878 1.0000000 0.2885707
motivation 0.2451323 0.2885707 1.0000000
$Ycor
read write math science female
read 1.00000000 0.6285909 0.6792757 0.6906929 -0.04174278
write 0.62859089 1.0000000 0.6326664 0.5691498 0.24433183
math 0.67927568 0.6326664 1.0000000 0.6495261 -0.04821830
science 0.69069291 0.5691498 0.6495261 1.0000000 -0.13818587
female -0.04174278 0.2443318 -0.0482183 -0.1381859 1.00000000
$XYcor
locus_of_control self_concept motivation read write math science female
locus_of_control 1.0000000 0.17118778 0.24513227 0.37356505 0.35887684 0.3372690 0.32462694 0.11341075
self_concept 0.1711878 1.00000000 0.28857075 0.06065584 0.01944856 0.0535977 0.06982633 -0.12595132
motivation 0.2451323 0.28857075 1.00000000 0.21060992 0.25424818 0.1950135 0.11566948 0.09810277
read 0.3735650 0.06065584 0.21060992 1.00000000 0.62859089 0.6792757 0.69069291 -0.04174278
write 0.3588768 0.01944856 0.25424818 0.62859089 1.00000000 0.6326664 0.56914983 0.24433183
math 0.3372690 0.05359770 0.19501347 0.67927568 0.63266640 1.0000000 0.64952612 -0.04821830
science 0.3246269 0.06982633 0.11566948 0.69069291 0.56914983 0.6495261 1.00000000 -0.13818587
female 0.1134108 -0.12595132 0.09810277 -0.04174278 0.24433183 -0.0482183 -0.13818587 1.00000000
Due to the length of the output, we will be making comments in several places along the way.
cc1 <- cc(psych,acad)
# display the canonical correlations
cc1[1]
$cor
[1] 0.4640861 0.1675092 0.1039911
# raw canonical coefficients
cc1[3:4]
$xcoef
[,1] [,2] [,3]
locus_of_control -1.2538339 -0.6214776 -0.6616896
self_concept 0.3513499 -1.1876866 0.8267210
motivation -1.2624204 2.0272641 2.0002283
$ycoef
[,1] [,2] [,3]
read -0.044620600 -0.004910024 0.021380576
write -0.035877112 0.042071478 0.091307329
math -0.023417185 0.004229478 0.009398182
science -0.005025152 -0.085162184 -0.109835014
female -0.632119234 1.084642326 -1.794647036
The raw canonical coefficients are interpreted in a manner analogous to interpreting regression coefficients i.e., for the variable read, a one unit increase in reading leads to a .0446 decrease in the first canonical variate of set 2 when all of the other variables are held constant. Here is another example: being female leads to a .6321 decrease in the dimension 1 for the academic set with the other predictors held constant.
Next, we'll use comput to compute the loadings of the variables on the canonical dimensions (variates). These loadings are correlations between variables and the canonical variates.
# compute canonical loadings
cc2<-comput(psych, acad, cc1)
# display canonical loadings
cc2[3:6]
$corr.X.xscores
[,1] [,2] [,3]
locus_of_control -0.90404631 -0.3896883 -0.1756227
self_concept -0.02084327 -0.7087386 0.7051632
motivation -0.56715106 0.3508882 0.7451289
$corr.Y.xscores
[,1] [,2] [,3]
read -0.3900402 -0.06010654 0.01407661
write -0.4067914 0.01086075 0.02647207
math -0.3545378 -0.04990916 0.01536585
science -0.3055607 -0.11336980 -0.02395489
female -0.1689796 0.12645737 -0.05650916
$corr.X.yscores
[,1] [,2] [,3]
locus_of_control -0.41955531 -0.06527635 -0.01826320
self_concept -0.00967307 -0.11872021 0.07333073
motivation -0.26320691 0.05877699 0.07748681
$corr.Y.yscores
[,1] [,2] [,3]
read -0.8404480 -0.35882541 0.1353635
write -0.8765429 0.06483674 0.2545608
math -0.7639483 -0.29794884 0.1477611
science -0.6584139 -0.67679761 -0.2303551
female -0.3641127 0.75492811 -0.5434036
The above correlations are between observed variables and canonical variables which are known as the canonical loadings. These canonical variates are actually a type of latent variable.
In general, the number of canonical dimensions is equal to the number of variables in the smaller set; however, the number of significant dimensions may be even smaller. Canonical dimensions, also known as canonical variates, are latent variables that are analogous to factors obtained in factor analysis. For this particular model there are three canonical dimensions of which only the first two are statistically significant. (Note: I was not able to find a way to have R automatically compute the tests of the canonical dimensions in any of the packages so I have included some R code below.)
# tests of canonical dimensions
ev<-cc1$cor^2
ev2<-1-ev
n<-dim(psych)[1]
p<-length(psych)
q<-length(acad)
m<-n -3/2 - (p+q)/2
w<-cbind(NULL) # initialize wilks lambda
for (i in 1:3){
w<-cbind(w,prod(ev2[i:3]))
}
d1<-cbind(NULL)
d2<-cbind(NULL)
f<-cbind(NULL) # initialize f
for (i in 1:3){
s<-sqrt((p^2*q^2-4)/(p^2+q^2-5))
si<-1/s
df1<-p*q
d1<-cbind(d1,df1)
df2<-m*s-p*q/2+1
d2<-cbind(d2,df2)
r<-(1-w[i]^si)/w[i]^si
f<-cbind(f,r*df2/df1)
p<-p-1
q<-q-1
}
pv<-pf(f,d1,d2,lower.tail=FALSE)
dmat<-cbind(t(w),t(f),t(d1),t(d2),t(pv))
colnames(dmat)<-c("WilksL","F","df1","df2","p")
rownames(dmat)<-c(seq(1:length(w)))
dmat
WilksL F df1 df2 p
1 0.7543611 11.715733 15 1634.653 7.497594e-28
2 0.9614300 2.944459 8 1186.000 2.905057e-03
3 0.9891858 2.164612 3 594.000 9.109218e-02
As shown in the table above, the first test of the canonical dimensions tests whether all three dimensions are significant (they are, F = 11.72), the next test tests whether dimensions 2 and 3 combined are significant (they are, F = 2.94). Finally, the last test tests whether dimension 3, by itself, is significant (it is not). Therefore dimensions 1 and 2 must each be significant while dimension three is not.
When the variables in the model have very different standard deviations, the standardized coefficients allow for easier comparisons among the variables. Next, we'll compute the standardized canonical coefficients.
# standardized psych canonical coefficients
sd<-sd(psych)
s1<-diag(sd) # diagonal matrix of psych sd's
s1 %*% cc1$xcoef
[,1] [,2] [,3]
[1,] -0.8404196 -0.4165639 -0.4435172
[2,] 0.2478818 -0.8379278 0.5832620
[3,] -0.4326685 0.6948029 0.6855370
# standardized acad canonical coefficients
sd<-sd(acad)
s2<-diag(sd) # diagonal matrix of acad sd's
s2 %*% cc1$ycoef
[,1] [,2] [,3]
[1,] -0.45080116 -0.04960589 0.21600760
[2,] -0.34895712 0.40920634 0.88809662
[3,] -0.22046662 0.03981942 0.08848141
[4,] -0.04877502 -0.82659938 -1.06607828
[5,] -0.31503962 0.54057096 -0.89442764
The standardized canonical coefficients are interpreted in a manner analogous to interpreting standardized regression coefficients. For example, consider the variable read, a one standard deviation increase in reading leads to a 0.45 standard deviation increase in the score on the first canonical variate for set 2 when the other variables in the model are held constant.
Table 1: Tests of Canonical Dimensions
Canonical Mult.
Dimension Corr. F df1 df2 p
1 0.46 11.72 15 1634.7 0.0000
2 0.17 2.94 8 1186 0.0029
3 0.10 2.16 3 594 0.0911
Table 2: Standardized Canonical Coefficients
Dimension
1 2
Psychological Variables
locus of control -0.84 -0.42
self-concept 0.25 -0.84
motivation -0.43 0.69
Academic Variables plus Gender
reading -0.45 -0.05
writing -0.35 0.41
math -0.22 0.04
science -0.05 -0.83
gender (female=1) -0.32 0.54
Tests of dimensionality for the canonical correlation analysis, as shown in Table 1, indicate that two of the three canonical dimensions are statistically significant at the .05 level. Dimension 1 had a canonical correlation of 0.46 between the sets of variables, while for dimension 2 the canonical correlation was much lower at 0.17.
Table 2 presents the standardized canonical coefficients for the first two dimensions across both sets of variables. For the psychological variables, the first canonical dimension is most strongly influenced by locus of control (.84) and for the second dimension self-concept (-.84) and motivation (.69). For the academic variables plus gender, the first dimension was comprised of reading (.45), writing (.35) and gender (.32). For the second dimension writing (.41), science (-.83) and gender (.54) were the dominating variables.UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services