|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
and then following the installation instructions.findit fapara
Parallel analysis is a method for determining the number of components or factors to retain from pca or factor analysis. Essentially, the program works by creating a random dataset with the same numbers of observations and variables as the original data. A correlation matrix is computed from the randomly generated dataset and then eigenvalues of the correlation matrix are computed. When the eigenvalues from the random data are larger then the eigenvalues from the pca or factor analysis you known that the components or factors are mostly random noise.
We will demonstrate the use of the command fapara using a dataset from the Stata manual called bg2. We will begin with a pca and follow that with a factor analysis.
After running the factor command we will run the fapara command with the pca and reps(10) options. The pca option ensures that the program obtains the eigenvalues from the correlation matrix without communality estimates in the diagonal as you would find in factor analysis.
The reps(10) option indicates that the program will go through the process of generating random datasets 10 times and will average the eigenvalues obtained from the 10 correlation matrices. You do not have to specify a large number of replications to make this procedure work well. The eigenvalues of the random datasets to not vary tremendously. Ten replicatons should be sufficient.
webuse bg2
pca bg2cost1-bg2cost6
Principal components/correlation Number of obs = 568
Number of comp. = 6
Trace = 6
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 1.70622 .303339 0.2844 0.2844
Comp2 | 1.40288 .494225 0.2338 0.5182
Comp3 | .908652 .185673 0.1514 0.6696
Comp4 | .722979 .0560588 0.1205 0.7901
Comp5 | .66692 .074563 0.1112 0.9013
Comp6 | .592357 . 0.0987 1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
----------------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 | Unexplained
-------------+------------------------------------------------------------+-------------
bg2cost1 | 0.2741 0.5302 -0.2712 -0.7468 -0.0104 -0.1111 | 0
bg2cost2 | -0.3713 0.4428 -0.4974 0.2800 0.2996 0.5005 | 0
bg2cost3 | -0.4077 0.4834 0.0656 0.2466 -0.5649 -0.4646 | 0
bg2cost4 | -0.3766 0.2748 0.7266 -0.2213 0.4504 0.0538 | 0
bg2cost5 | 0.4776 0.3345 0.3829 0.1950 -0.3942 0.5657 | 0
bg2cost6 | 0.5009 0.3192 0.0144 0.4647 0.4824 -0.4453 | 0
----------------------------------------------------------------------------------------
fapara, pca reps(10)
PA -- Parallel Analysis for Principle Components
PA Eigenvalues Averaged Over 10 Replications
PCA PA Dif
c1 1.7062 1.1366 0.5696
c2 1.4029 1.0637 0.3392
c3 0.9087 1.0343 -0.1257
c4 0.7230 0.9707 -0.2477
c5 0.6669 0.9269 -0.2600
c6 0.5924 0.8677 -0.2754

The parallel analysis for this example indicates that two components should be retained. There
are two ways to tell this; (1) two of the eigenvalues in the PCA column are greater than
the average eigenvalues in the PA column, and (2) the dashed line for parallel analysis
in the graph crosses the solid pca line before reaching the third component.For the next example, we will run a factor analysis. This time we will run the fapara command without the pca option because this is a factor analysis. We will leave the number of replications at 10.
factor bg2cost1-bg2cost6
(obs=568)
Factor analysis/correlation Number of obs = 568
Method: principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 15
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 0.85389 0.31282 1.0310 1.0310
Factor2 | 0.54107 0.51786 0.6533 1.6844
Factor3 | 0.02321 0.17288 0.0280 1.7124
Factor4 | -0.14967 0.03951 -0.1807 1.5317
Factor5 | -0.18918 0.06197 -0.2284 1.3033
Factor6 | -0.25115 . -0.3033 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(15) = 269.07 Prob>chi2 = 0.0000
Factor loadings (pattern matrix) and unique variances
-----------------------------------------------------------
Variable | Factor1 Factor2 Factor3 | Uniqueness
-------------+------------------------------+--------------
bg2cost1 | 0.2470 0.3670 -0.0446 | 0.8023
bg2cost2 | -0.3374 0.3321 -0.0772 | 0.7699
bg2cost3 | -0.3764 0.3756 0.0204 | 0.7169
bg2cost4 | -0.3221 0.1942 0.1034 | 0.8479
bg2cost5 | 0.4550 0.2479 0.0641 | 0.7274
bg2cost6 | 0.4760 0.2364 -0.0068 | 0.7175
-----------------------------------------------------------
fapara, reps(10)
PA -- Parallel Analysis for Factor Analysis
PA Eigenvalues Averaged Over 10 Replications
FA PA Dif
c1 0.8539 0.1488 0.7051
c2 0.5411 0.0882 0.4529
c3 0.0232 0.0256 -0.0023
c4 -0.1497 -0.0118 -0.1379
c5 -0.1892 -0.0707 -0.1184
c6 -0.2512 -0.1260 -0.1252

The parallel analysis
indicates that there are at least two factors with a possibility that there is a
third factor because the eigenvalue for the third factor is very close in value to
the average eighenvalue for the third random factor in the PA column. This also
shows up in the graph where the parallel analysis dashed line crosses the solid
factor analysis line right at three factors.UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services