UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Textbook Examples
Computer-Aided Multivariate Analysis by Afifi and Clark
Chapter 11: Discriminant Analysis

Table 11.1, page 245. 
use http://www.ats.ucla.edu/stat/stata/examples/cama3/depress, clear

sort cases
by cases: tabstat sex age educat income health beddays acuteill chronill, statistics(mean sd)

_______________________________________________________________________________
-> cases = normal

   stats |       sex       age    educat    income    health   beddays  acuteill  chronill
---------+--------------------------------------------------------------------------------
    mean |  1.586066   45.2418  3.545082  21.67623  1.713115  .1721311  .2786885  .4836066
      sd |  .4935494  18.14649  1.331023  15.97547   .795869  .3782703  .4492755  .5007584
------------------------------------------------------------------------------------------

_______________________________________________________________________________
-> cases = depressed

   stats |       sex       age    educat    income    health   beddays  acuteill  chronill
---------+--------------------------------------------------------------------------------
    mean |       1.8     40.38      3.16      15.2      2.06       .42       .38       .62
      sd |   .404061  17.40032   1.16689  9.837454   .977502  .4985694  .4903144  .4903144
------------------------------------------------------------------------------------------
Figure 11.2, page 248.
NOTE:  We were unable to reproduce this graph.
Table 11.2, page 249.

NOTE: You will need to download the discrim ado and install it by typing findit discrim in the command line (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
discrim cases income, predict 

                   Dichotomous Discriminant Analysis
                                                 
Observations    = 294                            Obs Group 0 =       244
Indep variables = 1                              Obs Group 1 =        50
                                                  
Centroid 0  =   -0.0728                          R-square    =    0.0254
Centroid 1  =    0.3555                          Mahalanobis =    0.1834
Grand Cntd  =    0.2826
                                                  
Eigenvalue   =    0.0261                         Wilk's Lambda =  0.9746
Canon. Corr. =    0.1594                         Chi-square    =  7.5021
Eta Squared  =    0.0254                         Sign Chi2     =  0.0062


                         Discrim Function    Unstandardized
          Variable         Coefficients        Coefficients
          -------------------------------------------------
          income              0.0283                -0.0661
          constant           -0.5223                 1.3607

                                                     
                        ----- Predicted -----
            Actual   |  Group 0         Group 1 |   Total    Pr(G
            ---------+--------------------------+--------
            Group 0  |      121           123   |     244      0.83
            Group 1  |       19            31   |      50      0.17
            ---------+--------------------------+--------
            Total    |      140           154   |     294
            ---------+--------------------------+--------
                                                  
                    Correctly predicted =  51.70 %
                    Model sensitivity   =  49.59 %
                    Model specificity   =  62.00 %
                    False positive      =  38.00 %
                    False negative      =  50.41 %
                    -------------------------------
                    Positive pred value =  86.43 %
                    Negative pred value =  20.13 %
                    -------------------------------
                    Kendall's tau-b     = -71.10 %
                    Cohen's kappa       =   6.34 %
 Figure 11.5, page 252.
graph twoway scatter income age if cases==0, sym(T) || scatter income age if cases==1,sym(o) || ///
	function y = 45.089 -.622*x, range(18 70) xscale(range(15 90)) yscale(range(0 65)) /// 
	xtitle(age) ytitle(income) legend(order(1 2 3) label(1 "depress") label(2 "nondepress"))
Table 11.3, page 253.
discrim cases income age, predict

                   Dichotomous Discriminant Analysis
                                                 
Observations    = 294                            Obs Group 0 =       244
Indep variables = 2                              Obs Group 1 =        50
                                                  
Centroid 0  =   -0.0961                          R-square    =    0.0434
Centroid 1  =    0.4690                          Mahalanobis =    0.3194
Grand Cntd  =    0.3729
                                                  
Eigenvalue   =    0.0454                         Wilk's Lambda =  0.9566
Canon. Corr. =    0.2084                         Chi-square    = 12.9179
Eta Squared  =    0.0434                         Sign Chi2     =  0.0016


                         Discrim Function    Unstandardized
          Variable         Coefficients        Coefficients
          -------------------------------------------------
          income              0.0336                -0.0595
          age                 0.0209                -0.0370
          constant           -1.5157                 2.8684

                                                     
                        ----- Predicted -----
            Actual   |  Group 0         Group 1 |   Total    Pr(G
            ---------+--------------------------+--------
            Group 0  |      154            90   |     244      0.83
            Group 1  |       20            30   |      50      0.17
            ---------+--------------------------+--------
            Total    |      174           120   |     294
            ---------+--------------------------+--------
                                                  
                    Correctly predicted =  62.59 %
                    Model sensitivity   =  63.11 %
                    Model specificity   =  60.00 %
                    False positive      =  40.00 %
                    False negative      =  36.89 %
                    -------------------------------
                    Positive pred value =  88.51 %
                    Negative pred value =  25.00 %
                    -------------------------------
                    Kendall's tau-b     = -32.54 %
                    Cohen's kappa       =  14.85 %
Table 11.4, page 257.

NOTE: The discriminant function is given in the output above; we were unable to reproduce the classification functions.

Page 258 Covariances in the middle of the page.
corr age income, cov

(obs=294)

             |      age   income
-------------+------------------
         age |  327.083
      income | -53.0073  233.788
Page 268 Table 11.5 

To start this problem, you'll need to add a new variable to the data set and use the variable cesd to recode the new variable, which we called cases3 and finally, you will need to download the daoneway program and install it by typing findit daoneway in the command line (see How can I use the findit command to search for programs and get additional help? for more information about using findit).

NOTE: We were unable to reproduce the classification functions in the middle of the table.
gen cases3 = 2
replace cases3=1 if cesd == 0
replace cases3=3 if cesd > 15
daoneway sex age educat income health beddays, by(cases3)

                    One-way Disciminant Function Analysis

Observations = 294
Variables    = 6
Groups       = 3

                 Pct of   Cum  Canonical  After  Wilks'
 Fcn Eigenvalue Variance  Pct     Corr      Fcn  Lambda  Chi-square  df  P-value
                                         |   0  0.83984    50.357    12   0.0000
   1    0.1656   88.50  88.50    0.3769  |   1  0.97893     6.145     5   0.2924
   2    0.0215   11.50 100.00    0.1452  |

Unstandardized canonical discriminant function coefficients

           func1    func2
    sex   0.7310   0.0198
    age  -0.0317  -0.0253
 educat   0.0062   0.6548
 income  -0.0276   0.0007
 health   0.5752   0.6882
beddays   1.1364  -1.1312
  _cons  -0.4963  -2.1777

Standardized canonical discriminant function coefficients

           func1    func2
    sex   0.3505   0.0095
    age  -0.5681  -0.4539
 educat   0.0080   0.8538
 income  -0.4176   0.0102
 health   0.4756   0.5690
beddays   0.4551  -0.4530

Canonical discriminant structure matrix

           func1    func2
    sex   0.4433  -0.1003
    age  -0.3491  -0.3912
 educat  -0.1695   0.7684
 income  -0.3827   0.3352
 health   0.4523   0.1063
beddays   0.5992  -0.2282

Group means on canonical discriminant functions

            func1    func2
cases3-1  -0.6504  -0.3286
cases3-2  -0.0859   0.0870
cases3-3   0.8032  -0.1418
Figure 11.6, page 271. 

NOTE: We were unable to reproduce this graph.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California