|
|
|
||||
|
|
|||||
Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relates to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.
Example 2. There is Fisher's (1936) classic example of discriminant analysis involving three varities of iris and four predictor variables (petal width, petal length, sepal width, and sepal length). Fisher not only wanted to determine if the varieties differed significantly on the four continuous variables but he was also interested in predicting variety classification for unknown individual plants.
We have included the data file, which can be obtained by clicking on discrim.sav. The dataset has 244 observations on four variables. The psychological variables are outdoor interests, social and conservative. The categorical variable is job type with three levels; 1) customer service, 2) mechanic, and 3) dispatcher.
Let's look at the data.
get file='d:\data\discrim.sav' . descriptives variables=outdoor social conservative /statistics=mean stddev min max . Descriptive Statistics | ------------------ | --- | ------- | ------- | ------- | -------------- | | | N | Minimum | Maximum | Mean | Std. Deviation | | ------------------ | --- | ------- | ------- | ------- | -------------- | | outdoor | 244 | .00 | 28.00 | 15.6393 | 4.83993 | | ------------------ | --- | ------- | ------- | ------- | -------------- | | social | 244 | 7.00 | 35.00 | 20.6762 | 5.47926 | | ------------------ | --- | ------- | ------- | ------- | -------------- | | conservative | 244 | .00 | 20.00 | 10.5902 | 3.72679 | | ------------------ | --- | ------- | ------- | ------- | -------------- | | Valid N (listwise) | 244 | | | | | | ------------------ | --- | ------- | ------- | ------- | -------------- | means tables=outdoor social conservative by job /cells mean count stddev . Report | -------- | -------------- | ------- | ------- | ------------ | | job | | outdoor | social | conservative | | -------- | -------------- | ------- | ------- | ------------ | | 1.00 | Mean | 12.5176 | 24.2235 | 9.0235 | | customer | -------------- | ------- | ------- | ------------ | | service | N | 85 | 85 | 85 | | | -------------- | ------- | ------- | ------------ | | | Std. Deviation | 4.64863 | 4.33528 | 3.14331 | | -------- | -------------- | ------- | ------- | ------------ | | 2.00 | Mean | 18.5376 | 21.1398 | 10.1398 | | mechanic | -------------- | ------- | ------- | ------------ | | | N | 93 | 93 | 93 | | | -------------- | ------- | ------- | ------------ | | | Std. Deviation | 3.56480 | 4.55066 | 3.24235 | | -------- | -------------- | ------- | ------- | ------------ | | 3.00 | Mean | 15.5758 | 15.4545 | 13.2424 | | dispatch | -------------- | ------- | ------- | ------------ | | | N | 66 | 66 | 66 | | | -------------- | ------- | ------- | ------------ | | | Std. Deviation | 4.11025 | 3.76699 | 3.69224 | | -------- | -------------- | ------- | ------- | ------------ | | Total | Mean | 15.6393 | 20.6762 | 10.5902 | | | -------------- | ------- | ------- | ------------ | | | N | 244 | 244 | 244 | | | -------------- | ------- | ------- | ------------ | | | Std. Deviation | 4.83993 | 5.47926 | 3.72679 | | -------- | -------------- | ------- | ------- | ------------ | correlations variables=outdoor social conservative . Correlations | ------------ | ------------------- | ------- | ------ | ------------ | | | | outdoor | social | conservative | | ------------ | ------------------- | ------- | ------ | ------------ | | outdoor | Pearson Correlation | 1 | -.071 | .079 | | | ------------------- | ------- | ------ | ------------ | | | Sig. (2-tailed) | | .267 | .217 | | | ------------------- | ------- | ------ | ------------ | | | N | 244 | 244 | 244 | | ------------ | ------------------- | ------- | ------ | ------------ | | social | Pearson Correlation | -.071 | 1 | -.236 | | | ------------------- | ------- | ------ | ------------ | | | Sig. (2-tailed) | .267 | | .000 | | | ------------------- | ------- | ------ | ------------ | | | N | 244 | 244 | 244 | | ------------ | ------------------- | ------- | ------ | ------------ | | conservative | Pearson Correlation | .079 | -.236 | 1 | | | ------------------- | ------- | ------ | ------------ | | | Sig. (2-tailed) | .217 | .000 | | | | ------------------- | ------- | ------ | ------------ | | | N | 244 | 244 | 244 | | ------------ | ------------------- | ------- | ------ | ------------ | frequencies variables=job . job | ----- | ---------------------- | --------- | ------- | ------------- | ------------------ | | | | Frequency | Percent | Valid Percent | Cumulative Percent | | ----- | ---------------------- | --------- | ------- | ------------- | ------------------ | | Valid | 1.00 customer service | 85 | 34.8 | 34.8 | 34.8 | | | ---------------------- | --------- | ------- | ------------- | ------------------ | | | 2.00 mechanic | 93 | 38.1 | 38.1 | 73.0 | | | ---------------------- | --------- | ------- | ------------- | ------------------ | | | 3.00 dispatch | 66 | 27.0 | 27.0 | 100.0 | | | ---------------------- | --------- | ------- | ------------- | ------------------ | | | Total | 244 | 100.0 | 100.0 | | | ----- | ---------------------- | --------- | ------- | ------------- | ------------------ |
We will run the discriminant analysis using the discriminant procedure in SPSS. We could also have run the discrim lad to get the same analysis with slightly different output. There is a lot of output so we will comment at various places along the way.
discriminant /groups=job(1 3) /variables=outdoor social conservative /analysis all /priors equal /statistics=boxm table /plot=combined map /classify=pooled . Group Statistics | -------- | ------------ | -------------------------- | | job | | Valid N (listwise) | | | | --------------- | -------- | | | | Unweighted | Weighted | | -------- | ------------ | --------------- | -------- | | 1.00 | outdoor | 85 | 85.000 | | customer | ------------ | --------------- | -------- | | service | social | 85 | 85.000 | | | ------------ | --------------- | -------- | | | conservative | 85 | 85.000 | | -------- | ------------ | --------------- | -------- | | 2.00 | outdoor | 93 | 93.000 | | mechanic | ------------ | --------------- | -------- | | | social | 93 | 93.000 | | | ------------ | --------------- | -------- | | | conservative | 93 | 93.000 | | -------- | ------------ | --------------- | -------- | | 3.00 | outdoor | 66 | 66.000 | | dispatch | ------------ | --------------- | -------- | | | social | 66 | 66.000 | | | ------------ | --------------- | -------- | | | conservative | 66 | 66.000 | | -------- | ------------ | --------------- | -------- | | Total | outdoor | 244 | 244.000 | | | ------------ | --------------- | -------- | | | social | 244 | 244.000 | | | ------------ | --------------- | -------- | | | conservative | 244 | 244.000 | | -------- | ------------ | --------------- | -------- | Summary of Canonical Discriminant Functions Eigenvalues | -------- | ---------- | ------------- | ------------ | --------------------- | | Function | Eigenvalue | % of Variance | Cumulative % | Canonical Correlation | | -------- | ---------- | ------------- | ------------ | --------------------- | | 1 | 1.081(a) | 77.1 | 77.1 | .721 | | -------- | ---------- | ------------- | ------------ | --------------------- | | 2 | .321(a) | 22.9 | 100.0 | .493 | | -------- | ---------- | ------------- | ------------ | --------------------- | a First 2 canonical discriminant functions were used in the analysis. Wilks' Lambda | ------------------- | ------------- | ---------- | -- | ---- | | Test of Function(s) | Wilks' Lambda | Chi-square | df | Sig. | | ------------------- | ------------- | ---------- | -- | ---- | | 1 through 2 | .364 | 242.552 | 6 | .000 | | ------------------- | ------------- | ---------- | -- | ---- | | 2 | .757 | 66.723 | 2 | .000 | | ------------------- | ------------- | ---------- | -- | ---- |
There are two discriminant dimensions both of which are statistically significant. The canonical correlations for the dimensions one and two are 0.72 and 0.49 respectively.
Standardized Canonical Discriminant Function Coefficients | ------------ | ------------- | | | Function | | | ----- | ----- | | | 1 | 2 | | ------------ | ----- | ----- | | outdoor | .379 | .926 | | ------------ | ----- | ----- | | social | -.831 | .213 | | ------------ | ----- | ----- | | conservative | .517 | -.291 | | ------------ | ----- | ----- | Structure Matrix | ------------ | ------------------ | | | Function | | | -------- | ------- | | | 1 | 2 | | ------------ | -------- | ------- | | social | -.765(*) | .266 | | ------------ | -------- | ------- | | conservative | .468(*) | -.259 | | ------------ | -------- | ------- | | outdoor | .323 | .937(*) | | ------------ | -------- | ------- | Pooled within-groups correlations between discriminating variables and standardized canonical discriminant functions Variables ordered by absolute size of correlation within function. * Largest absolute correlation between each variable and any discriminant function
The standardized discriminant coefficients function in a manner analogous to standardized regression coefficients in OLS regression. For example, a one standard deviation increase on the outdoor variable will result in a .32 standard deviation decrease in the predicted values on discriminant function 1. The canonical structure, also known as canonical loading or discriminant loadings, represent correlations between observed variables and the unob served discriminant functions (dimensions). The discriminat functions are a kind of latent variable and the correlations are loadings analgous to factor loadings.
Functions at Group Centroids | ---------------------- | -------------- | | job | Function | | | ------ | ----- | | | 1 | 2 | | ---------------------- | ------ | ----- | | 1.00 customer service | -1.219 | -.389 | | ---------------------- | ------ | ----- | | 2.00 mechanic | .107 | .715 | | ---------------------- | ------ | ----- | | 3.00 dispatch | 1.420 | -.506 | | ---------------------- | ------ | ----- | Unstandardized canonical discriminant functions evaluated at group means Classification Results(a) | -------- | ----- | ---------------------- | -------------------------------------------------------- | ---------------------- | | | | job | Predicted Group Membership | Total | | | | | ---------------------- | -------------- | -------------- | ---------------------- | | | | | 1.00 customer service | 2.00 mechanic | 3.00 dispatch | 1.00 customer service | | -------- | ----- | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | Original | Count | 1.00 customer service | 70 | 11 | 4 | 85 | | | | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | | 2.00 mechanic | 16 | 62 | 15 | 93 | | | | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | | 3.00 dispatch | 3 | 12 | 51 | 66 | | | ----- | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | % | 1.00 customer service | 82.4 | 12.9 | 4.7 | 100.0 | | | | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | | 2.00 mechanic | 17.2 | 66.7 | 16.1 | 100.0 | | | | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | | 3.00 dispatch | 4.5 | 18.2 | 77.3 | 100.0 | | -------- | ----- | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | a 75.0% of original grouped cases correctly classified.
The output includes the means on the discriminant functions for each of the three groups and a classification table. Values in the diagonal of the classification table reflect the correct classification of individuals into groups based on their scores on the discriminant dimensions.
Next, we will plot a graph of individuals on the discriminant dimensions. Due to the large number of subjects we will shorten the labels for the job groups to make the graph more legible. As long as we don't save the dataset these new lables will not be made permanent.
As you can see the customer service people tend to be a the more social (negative) end of dimension 1 and dispatchers at the opposite end with mechanics in the middle. On dimension 2 the results are not as clear, however the mechanics tend to be higher on the outdoor dimension and customer service and dispatchers lower.
SPSS also produces an ASCII territorial map plot which shows the relative location of the boundries of the different categories. The territorial map is shown below.
Territorial Map
Canonical Discriminant
Function 2
-6.0 -4.0 -2.0 .0 2.0 4.0 6.0
.............................................................
6.0 . 122 .
. 112 2.
. 12 223.
. 122 233 .
. 112 223 .
. 122 233 .
4.0 . 112 . . . . 223 .
. 12 233 .
. 122 223 .
. 112 2233 .
. 12 233 .
. 122 223 .
2.0 . . 112 . . 233 . .
. 122 223 .
. 112 233 .
. 12 223 .
. 122 * 233 .
. 112 223 .
.0 . . . 122. 233 . . .
. * 112 223 .
. 1233 * .
. 13 .
. 13 .
. 13 .
-2.0 . . . 13 . . .
. 13 .
. 13 .
. 13 .
. 13 .
. 13 .
-4.0 . . . 13 . . .
. 13 .
. 13 .
. 13 .
. 13 .
. 13 .
-6.0 . 13 .
.............................................................
-6.0 -4.0 -2.0 .0 2.0 4.0 6.0
Canonical Discriminant Function 1
Symbols used in territorial map
Symbol Group Label
------ ----- --------------------
1 1 customer service
2 2 mechanic
3 3 dispatch
* Indicates a group centroid
Table 1: Tests of Discriminant Dimensions
Canonical Chi-
Dimension Corr. square df p
1 0.72 242.55 6 0.000
2 0.49 66.72 2 0.000
Table 2: Standardized Discriminant Coefficients
Dimension
1 2
outdoor 0.38 0.93
social -0.83 0.21
conservative 0.52 -0.29
Tests of dimensionality for the discriminant analysis, as shown in Table 1, indicate that both of the dimensions are statistically significant. The F-tests associated with each dimension are exact. Dimension 1 had a canonical correlation of 0.72 between the response variables and the job classification, while for dimension 2 the canonical correlation was lower at 0.49.
Table 2 presents the standardized canonical coefficients for both dimensions. The first discriminant dimension is positively weighted by outdoor (0.38) and conservation (0.52 and strongly negative on social (-0.83). The second discriminant dimenstion is dominated by the outdoor variable (0.93). These results are interpreted to indicate that the first dimension reflects a bipolar social/non-social dimension while the second is an outdoor/non-outdoor dimension.UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services