|
|
|
||||
|
|
|||||
Stata added or improved a number of multivariate procedures is Stata 9. Stata 10 continues to imrpove upon multivariate techniques by adding full k-group discriminant function analysis, adding multiple correspondence analysis (mca), and by improving the multideminsional scaling (mds) program.
1: Discriminant analysis
Stata has added the following discriminant analysis procedures:We will demonstrate examples of these commands using Fisher's Iris example.
use http://www.ats.ucla.edu/stat/stata/seminars/stata10/iris, clear
candisc sl sw pl pw, group(type)
Canonical linear discriminant analysis
| | Like-
| Canon. Eigen- Variance | lihood
Fcn | Corr. value Prop. Cumul. | Ratio F df1 df2 Prob>F
----+---------------------------------+------------------------------------
1 | 0.9848 32.1919 0.9912 0.9912 | 0.0234 199.15 8 288 0.0000 e
2 | 0.4712 .285391 0.0088 1.0000 | 0.7780 13.794 3 145 0.0000 e
---------------------------------------------------------------------------
Ho: this and smaller canon. corr. are zero; e = exact F
Standardized canonical discriminant function coefficients
| function1 function2
-------------+----------------------
sl | -.4269549 -.0124077
sw | -.5212416 -.7352612
pl | .9472573 .4010379
pw | .5751607 -.5810398
Canonical structure
| function1 function2
-------------+----------------------
sl | .2225959 -.3108118
sw | -.1190115 -.8636809
pl | .7060654 -.1677014
pw | .6331779 -.7372421
Group means on canonical variables
type | function1 function2
-------------+----------------------
setosa | -7.6076 -.215133
versicolor | 1.825049 .7278996
virginica | 5.78255 -.5127666
Resubstitution classification summary
+---------+
| Key |
|---------|
| Number |
| Percent |
+---------+
| Classified
True type | setosa versicolor virginica | Total
-------------+------------------------------------+-----------
setosa | 50 0 0 | 50
| 100.00 0.00 0.00 | 100.00
| |
versicolor | 0 48 2 | 50
| 0.00 96.00 4.00 | 100.00
| |
virginica | 0 1 49 | 50
| 0.00 2.00 98.00 | 100.00
-------------+------------------------------------+-----------
Total | 50 49 51 | 150
| 33.33 32.67 34.00 | 100.00
| |
Priors | 0.3333 0.3333 0.3333 |
estat manova
Number of obs = 150
W = Wilks' lambda L = Lawley-Hotelling trace
P = Pillai's trace R = Roy's largest root
Source | Statistic df F(df1, df2) = F Prob>F
-----------+--------------------------------------------------
type | W 0.0234 2 8.0 288.0 199.15 0.0000 e
| P 1.1919 8.0 290.0 53.47 0.0000 a
| L 32.4773 8.0 286.0 580.53 0.0000 a
| R 32.1919 4.0 145.0 1166.96 0.0000 u
|--------------------------------------------------
Residual | 147
-----------+--------------------------------------------------
Total | 149
--------------------------------------------------------------
e = exact, a = approximate, u = upper bound on F
/* group summarize */
estat grsummarize
Estimation sample candisc
Summarized by type
| type
Mean | setosa versicolor virginica | Total
-------------+------------------------------------+-----------
sl | 5.006 5.936 6.588 | 5.843333
sw | 3.428 2.77 2.974 | 3.057333
pl | 1.462 4.26 5.552 | 3.758
pw | .246 1.326 2.026 | 1.199333
-------------+------------------------------------+-----------
N | 50 50 50 | 150
/* Mahalanobis squared distances between the group means */
estat grdistances
Mahalanobis squared distances between groups
| type
type | setosa versicolor virginica
-------------+-----------------------------------
setosa | 0
versicolor | 89.864186 0
virginica | 179.38471 17.201066 0
label define tl 1 "S" 2 "C" 3 "V", modify
scoreplot, msymbol(i)
discrim qda sl sw pl pw, group(type)
Quadratic discriminant analysis
Resubstitution classification summary
+---------+
| Key |
|---------|
| Number |
| Percent |
+---------+
| Classified
True type | setosa versicolor virginica | Total
-------------+------------------------------------+-----------
setosa | 50 0 0 | 50
| 100.00 0.00 0.00 | 100.00
| |
versicolor | 0 48 2 | 50
| 0.00 96.00 4.00 | 100.00
| |
virginica | 0 1 49 | 50
| 0.00 2.00 98.00 | 100.00
-------------+------------------------------------+-----------
Total | 50 49 51 | 150
| 33.33 32.67 34.00 | 100.00
| |
Priors | 0.3333 0.3333 0.3333 |
discrim logistic sl sw pl pw, group(type)
Iteration 0: log likelihood = -164.79184
Iteration 1: log likelihood = -67.780459
(omitted)
Iteration 22: log likelihood = -5.9492736
Iteration 23: log likelihood = -5.9492736
Logistic discriminant analysis
Resubstitution classification summary
+---------+
| Key |
|---------|
| Number |
| Percent |
+---------+
| Classified
True type | setosa versicolor virginica | Total
-------------+------------------------------------+-----------
setosa | 50 0 0 | 50
| 100.00 0.00 0.00 | 100.00
| |
versicolor | 0 49 1 | 50
| 0.00 98.00 2.00 | 100.00
| |
virginica | 0 1 49 | 50
| 0.00 2.00 98.00 | 100.00
-------------+------------------------------------+-----------
Total | 50 50 50 | 150
| 33.33 33.33 33.33 | 100.00
| |
Priors | 0.3333 0.3333 0.3333 |
discrim knn sl sw pl pw, group(type) k(3)
Kth-nearest-neighbor discriminant analysis
Resubstitution classification summary
+---------+
| Key |
|---------|
| Number |
| Percent |
+---------+
| Classified
True type | setosa versicolor virginica | Total
-------------+------------------------------------+-----------
setosa | 50 0 0 | 50
| 100.00 0.00 0.00 | 100.00
| |
versicolor | 0 47 3 | 50
| 0.00 94.00 6.00 | 100.00
| |
virginica | 0 3 47 | 50
| 0.00 6.00 94.00 | 100.00
-------------+------------------------------------+-----------
Total | 50 50 50 | 150
| 33.33 33.33 33.33 | 100.00
| |
Priors | 0.3333 0.3333 0.3333 |
2: Correspondence analysis
Correspondence analysis has been improved by adding multiple correspondence analysis (mca) to the existing simple correspondence analysis (ca).
use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear
/* ca from What's New in Stata 9 */
ca prog ses
Correspondence analysis Number of obs = 200
Pearson chi2(4) = 16.60
Prob > chi2 = 0.0023
Total inertia = 0.0830
3 active rows Number of dim. = 2
3 active columns Expl. inertia (%) = 100.00
| singular principal cumul
Dimension | value inertia chi2 percent percent
------------+------------------------------------------------------------
dim 1 | .2604912 .0678557 13.57 81.73 81.73
dim 2 | .1231525 .0151665 3.03 18.27 100.00
------------+------------------------------------------------------------
total | .0830222 16.60 100
Statistics for row and column categories in symmetric normalization
| overall | dimension_1 | dimension_2
Categories | mass quality %inert | coord sqcorr contrib | coord sqcorr contrib
-------------+---------------------------+---------------------------+---------------------------
prog | | |
general | 0.225 1.000 0.249 | 0.442 0.555 0.169 | 0.576 0.445 0.606
academic | 0.525 1.000 0.384 | -0.482 0.997 0.469 | -0.039 0.003 0.006
vocation | 0.250 1.000 0.367 | 0.615 0.807 0.363 | -0.437 0.193 0.387
-------------+---------------------------+---------------------------+---------------------------
ses | | |
low | 0.235 1.000 0.247 | 0.432 0.558 0.168 | 0.559 0.442 0.597
middle | 0.475 1.000 0.180 | 0.270 0.603 0.133 | -0.319 0.397 0.392
high | 0.290 1.000 0.573 | -0.792 0.996 0.699 | 0.069 0.004 0.011
-------------------------------------------------------------------------------------------------
/* mca for Stata 10 */
mca prog ses female
Multiple/Joint correspondence analysis Number of obs = 200
Total inertia = .0353896
Method: Burt/adjusted inertias Number of axes = 2
| principal cumul
Dimension | inertia percent percent
------------+----------------------------------
dim 1 | .0182501 51.57 51.57
dim 2 | .0075739 21.40 72.97
dim 3 | .000025 0.07 73.04
------------+----------------------------------
Total | .0353896 100.00
Statistics for column categories in standard normalization
| overall | dimension_1 | dimension_2
Categories | mass quality %inert | coord sqcorr contrib | coord sqcorr contrib
-------------+---------------------------+---------------------------+---------------------------
prog | | |
general | 0.075 0.699 0.098 | 1.152 0.524 0.099 | 1.030 0.174 0.080
academic | 0.175 0.736 0.151 | -1.096 0.719 0.210 | 0.260 0.017 0.012
vocation | 0.083 0.748 0.144 | 1.265 0.478 0.133 | -1.474 0.270 0.181
-------------+---------------------------+---------------------------+---------------------------
ses | | |
low | 0.078 0.727 0.179 | 1.392 0.438 0.152 | 1.754 0.288 0.241
middle | 0.158 0.760 0.085 | 0.434 0.181 0.030 | -1.203 0.579 0.229
high | 0.097 0.743 0.235 | -1.839 0.716 0.327 | 0.550 0.027 0.029
-------------+---------------------------+---------------------------+---------------------------
female | | |
male | 0.152 0.678 0.059 | -0.418 0.230 0.027 | -0.905 0.448 0.124
female | 0.182 0.678 0.050 | 0.349 0.230 0.022 | 0.756 0.448 0.104
-------------------------------------------------------------------------------------------------
3: Multidimensional scaling
Improved multideminsional scaling (mds) addingg for modern metric and nonmetric MDS in addition to classical multidimensional scaling.
use http://www.stata-press.com/data/r10/cerealnut, clear
mds calories-K, id(brand) method(classical)
Classical metric multidimensional scaling
dissimilarity: L2, computed on 8 variables
Number of obs = 25
Eigenvalues > 0 = 8 Mardia fit measure 1 = 0.9603
Retained dimensions = 2 Mardia fit measure 2 = 0.9970
--------------------------------------------------------------------------
| abs(eigenvalue) (eigenvalue)^2
Dimension | Eigenvalue Percent Cumul. Percent Cumul.
-------------+------------------------------------------------------------
1 | 158437.92 56.95 56.95 67.78 67.78
2 | 108728.77 39.08 96.03 31.92 99.70
-------------+------------------------------------------------------------
3 | 10562.645 3.80 99.83 0.30 100.00
4 | 382.67849 0.14 99.97 0.00 100.00
5 | 69.761715 0.03 99.99 0.00 100.00
6 | 12.520822 0.00 100.00 0.00 100.00
7 | 5.7559984 0.00 100.00 0.00 100.00
8 | 2.2243244 0.00 100.00 0.00 100.00
--------------------------------------------------------------------------
mds calories-K, id(brand) method(modern) loss(strain) transform(ident)
Iteration 1: strain = 594.12657
Iteration 2: strain = 594.12657
Modern multidimensional scaling
dissimilarity: L2, computed on 8 variables
Loss criterion: strain = loss for classical MDS
Transformation: identity (no transformation)
Number of obs = 25
Dimensions = 2
Normalization: principal Loss criterion = 594.1266
mds calories-K, id(brand) method(nonmetric) loss(stress) transform(monotonic)
Iteration 1t: stress = .02607533
Iteration 1c: stress = .02115885
(omitted)
Iteration 77t: stress = .01541258
Iteration 77c: stress = .01541258
Modern multidimensional scaling
dissimilarity: L2, computed on 8 variables
Loss criterion: stress = raw_stress/norm(distances)
Transformation: monotonic (nonmetric)
Number of obs = 25
Dimensions = 2
Normalization: principal Loss criterion = 0.0154

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services