UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

Stata FAQ
How to do parallel analysis for pca or factor analysis in Stata?

To do parallel analysis for pca or factor analysis you will need to download a program written by ATS called fapara. You can get the program by typing the command,

findit fapara
and then following the installation instructions.

Parallel analysis is a method for determining the number of components or factors to retain from pca or factor analysis. Essentially, the program works by creating a random dataset with the same numbers of observations and variables as the original data. A correlation matrix is computed from the randomly generated dataset and then eigenvalues of the correlation matrix are computed. When the eigenvalues from the random data are larger then the eigenvalues from the pca or factor analysis you known that the components or factors are mostly random noise.

We will demonstrate the use of the command fapara using a dataset from the Stata manual called bg2. We will begin with a pca and follow that with a factor analysis.

After running the factor command we will run the fapara command with the pca and reps(10) options. The pca option ensures that the program obtains the eigenvalues from the correlation matrix without communality estimates in the diagonal as you would find in factor analysis.

The reps(10) option indicates that the program will go through the process of generating random datasets 10 times and will average the eigenvalues obtained from the 10 correlation matrices. You do not have to specify a large number of replications to make this procedure work well. The eigenvalues of the random datasets to not vary tremendously. Ten replicatons should be sufficient.

webuse bg2

pca bg2cost1-bg2cost6

Principal components/correlation                  Number of obs    =       568
                                                  Number of comp.  =         6
                                                  Trace            =         6
    Rotation: (unrotated = principal)             Rho              =    1.0000

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |      1.70622      .303339             0.2844       0.2844
           Comp2 |      1.40288      .494225             0.2338       0.5182
           Comp3 |      .908652      .185673             0.1514       0.6696
           Comp4 |      .722979     .0560588             0.1205       0.7901
           Comp5 |       .66692      .074563             0.1112       0.9013
           Comp6 |      .592357            .             0.0987       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    ----------------------------------------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4     Comp5     Comp6 | Unexplained 
    -------------+------------------------------------------------------------+-------------
        bg2cost1 |   0.2741    0.5302   -0.2712   -0.7468   -0.0104   -0.1111 |           0 
        bg2cost2 |  -0.3713    0.4428   -0.4974    0.2800    0.2996    0.5005 |           0 
        bg2cost3 |  -0.4077    0.4834    0.0656    0.2466   -0.5649   -0.4646 |           0 
        bg2cost4 |  -0.3766    0.2748    0.7266   -0.2213    0.4504    0.0538 |           0 
        bg2cost5 |   0.4776    0.3345    0.3829    0.1950   -0.3942    0.5657 |           0 
        bg2cost6 |   0.5009    0.3192    0.0144    0.4647    0.4824   -0.4453 |           0 
    ----------------------------------------------------------------------------------------

fapara, pca reps(10)

PA -- Parallel Analysis for Principle Components
PA Eigenvalues Averaged Over 10 Replications

        PCA       PA      Dif
c1   1.7062   1.1366   0.5696
c2   1.4029   1.0637   0.3392
c3   0.9087   1.0343  -0.1257
c4   0.7230   0.9707  -0.2477
c5   0.6669   0.9269  -0.2600
c6   0.5924   0.8677  -0.2754

The parallel analysis for this example indicates that two components should be retained. There are two ways to tell this; (1) two of the eigenvalues in the PCA column are greater than the average eigenvalues in the PA column, and (2) the dashed line for parallel analysis in the graph crosses the solid pca line before reaching the third component.

For the next example, we will run a factor analysis. This time we will run the fapara command without the pca option because this is a factor analysis. We will leave the number of replications at 10.

factor bg2cost1-bg2cost6
(obs=568)

Factor analysis/correlation                        Number of obs    =      568
    Method: principal factors                      Retained factors =        3
    Rotation: (unrotated)                          Number of params =       15

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      0.85389      0.31282            1.0310       1.0310
        Factor2  |      0.54107      0.51786            0.6533       1.6844
        Factor3  |      0.02321      0.17288            0.0280       1.7124
        Factor4  |     -0.14967      0.03951           -0.1807       1.5317
        Factor5  |     -0.18918      0.06197           -0.2284       1.3033
        Factor6  |     -0.25115            .           -0.3033       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(15) =  269.07 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    -----------------------------------------------------------
        Variable |  Factor1   Factor2   Factor3 |   Uniqueness 
    -------------+------------------------------+--------------
        bg2cost1 |   0.2470    0.3670   -0.0446 |      0.8023  
        bg2cost2 |  -0.3374    0.3321   -0.0772 |      0.7699  
        bg2cost3 |  -0.3764    0.3756    0.0204 |      0.7169  
        bg2cost4 |  -0.3221    0.1942    0.1034 |      0.8479  
        bg2cost5 |   0.4550    0.2479    0.0641 |      0.7274  
        bg2cost6 |   0.4760    0.2364   -0.0068 |      0.7175  
    -----------------------------------------------------------

fapara, reps(10)

PA -- Parallel Analysis for Factor Analysis
PA Eigenvalues Averaged Over 10 Replications

         FA       PA      Dif
c1   0.8539   0.1488   0.7051
c2   0.5411   0.0882   0.4529
c3   0.0232   0.0256  -0.0023
c4  -0.1497  -0.0118  -0.1379
c5  -0.1892  -0.0707  -0.1184
c6  -0.2512  -0.1260  -0.1252

The parallel analysis indicates that there are at least two factors with a possibility that there is a third factor because the eigenvalue for the third factor is very close in value to the average eighenvalue for the third random factor in the PA column. This also shows up in the graph where the parallel analysis dashed line crosses the solid factor analysis line right at three factors.


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.