UCLA Academic Technology Services HomeServicesClassesContactJobs

Stata Data Analysis Examples
Discriminant Function Analysis

Note: This data analysis example requires Stata 10 or later.

Examples of Discriminant Function Analysis

Example 1. A large international air carrier has collected data on employees in three different job classifications; 1) customer service personal, 2) mechanics and 3) dispatchers. The director of Human Resources wants to know if these three job classifications appeal to different personality types. Each employee is administered a battery of psychological test which include measures of interest in outdoor activity, sociability and conservativeness.

Example 2. There is Fisher's (1936) classic example of discriminant analysis involving three varieties of iris and four predictor variables (petal width, petal length, sepal width, and sepal length). Fisher not only wanted to determine if the varieties differed significantly on the four continuous variables but he was also interested in predicting variety classification for unknown individual plants.

Description of the Data

We have a data file, discrim.dta, with 244 observations on four variables. The psychological variables are outdoor interests, social and conservative. The categorical variable is job type with three levels; 1) customer service, 2) mechanic, and 3) dispatcher.

Let's look at the data.

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a discriminant function analysis, let's consider some other methods that you might use.

Stata Discriminant Function Analysis

We will run the discriminant analysis using the new candisc procedure in Stata 10. We could also have run the discrim lad to get the same analysis with slightly different output. There is a lot of output so we will comment at various places along the way.

There are two discriminant dimensions both of which are statistically significant. The canonical correlations for the dimensions one and two are 0.72 and 0.49 respectively.

The standardized discriminant coefficients function in a manner analogous to standardized regression coefficients in OLS regression. For example, a one standard deviation increase on the outdoor variable will result in a .32 standard deviation decrease in the predicted values on discriminant function 1. The canonical structure, also known as canonical loading or discriminant loadings, represent correlations between observed variables and the unobserved discriminant functions (dimensions). The discriminant functions are a kind of latent variable and the correlations are loadings analogous to factor loadings.

The output includes the means on the discriminant functions for each of the three groups and a classification table. Values in the diagonal of the classification table reflect the correct classification of individuals into groups based on their scores on the discriminant dimensions.

Next, we will plot a graph of individuals on the discriminant dimensions. Due to the large number of subjects we will shorten the labels for the job groups to make the graph more legible. As long as we don't save the dataset these new labels will not be made permanent.

As you can see the customer service people tend to be a the more social (negative) end of dimension 1 and dispatchers at the opposite end with mechanics in the middle. On dimension 2 the results are not as clear, however the mechanics tend to be higher on the outdoor dimension and customer service and dispatchers lower.

We can also plot the discriminant loadings for the variables onto the discriminant dimensions

And there is no surprise as the variable social is strong on the social dimension, i.e., it has a high negative loading and the outdoor variable is high on the outdoor dimension.

Sample Write-Up of the Analysis

There is a lot of variation in the write-ups of discriminant function analyses. The write-up below is fairly minimal, including only the tests of dimensionality and the standardized coefficients. Typically, one does not include raw coefficients with standard errors and Wald tests of significance.

Tests of dimensionality for the discriminant analysis, as shown in Table 1, indicate that both of the dimensions are statistically significant. The F-tests associated with each dimension are exact. Dimension 1 had a canonical correlation of 0.72 between the response variables and the job classification, while for dimension 2 the canonical correlation was lower at 0.49.

Table 2 presents the standardized canonical coefficients for both dimensions. The first discriminant dimension is positively weighted by outdoor (0.38) and conservation (0.52 and strongly negative on social (-0.83). The second discriminant dimension is dominated by the outdoor variable (0.93). These results are interpreted to indicate that the first dimension reflects a bipolar social/non-social dimension while the second is an outdoor/non-outdoor dimension.

Cautions, Flies in the Ointment

  • Multivariate normal distribution assumptions holds for the response variables.
  • Discriminant function analysis is not recommended for small samples.
  • See Also

     

    How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.