|
|
|
||||
|
Stat Computing >
Seminars > Introduction to Mplus: Featuring
CFA
|
|
||||
!This page is under construction!
This page was adapted from Mplus for Windows: An Introduction developed by the Statistical Support group, a division of Research Consulting at ITS at UT Austin. We are very grateful to them for their permission to copy and adapt these materials at our web site.
Section 1: Introduction
1. About
this Document
2. Introduction
to SEM and Mplus
3.
Accessing Mplus
4.
Getting Help with Mplus
Section 2: Latent Variable Modeling Using
Mplus
1.
Overview of SEM Assumptions
2.
Categorical Outcomes and Categorical Latent Variables
3.
Should you use Mplus?
Section 3: Using Mplus
1.
Launching Mplus
2.
The Command and Output Windows
3.
Reading Data and Outputting Sample Statistics
Section 4: Exploratory Factor Analysis
1.
Exploratory Factor Analysis with Continuous Variables
2.
Exploratory Factor Analysis with Missing Data
3.
Exploratory Factor Analysis with Categorical Outcomes
Section 5: Confirmatory Factor Analysis
and Structural Equation Models
1.
Confirmatory Factor Analysis with Continuous Variables
2.
Handling Missing Data
3.
Confirmatory Factor Analysis with Categorical Outcomes
4.
Structural Equation Modeling with Continuous Outcomes
Section 6: Advanced Models
1.
Multiple Group Analysis
2.
Multilevel Models
References
Section 1: Introduction
1. About this Document
This document introduces you to Mplus for Windows. It is primarily aimed
at first time users of Mplus who have prior experience with either exploratory
factor analysis (EFA), or confirmatory factor analysis (CFA)
and structural equation modeling (SEM). The document is organized
into six sections. The first section provides a brief introduction to Mplus
and describes how to obtain access to Mplus. The second section briefly
reviews SEM assumptions and describes important and useful model fitting
features that are unique to Mplus. The third section describes how to get
started with Mplus, how to read data from an external data file, and how
to obtain descriptive sample statistics. The fourth section explains how
to fit exploratory factor analysis models for continuous and categorical
outcomes using Mplus. The fifth section of this document demonstrates how
you can use Mplus to test confirmatory factor analysis and structural equation
models. The sixth section presents examples of two advanced models available
in Mplus: multiple group analysis and multilevel SEM. By the end of the
course you should be able to fit EFA and CFA/SEM models using Mplus. You
will also gain an appreciation for the types of research questions well-suited
to Mplus and some of its unique features.
2. Introduction to EFA, CFA, SEM and Mplus
Exploratory factor analysis (EFA) is a method of data reduction
in which you may infer the presence of latent factors that are responsible
for shared variation in multiple measured or observed variables. In EFA
each observed variable in the analysis may be related to each latent factor
contained in the analysis. By contrast, confirmatory factor analysis
(CFA) allows you to stipulate which latent factor is related to any given
observed variable. Structural equation modeling (SEM) is a more
general form of CFA in which latent factors may be regressed onto each
other. Mplus can fit EFA, CFA, and SEM models, among other models.
To effectively use and understand the course material, you should already
know how to conduct a multiple linear regression analysis and compute descriptive
statistics such as frequency tables using SAS, Stata, SPSS, or a similar general
statistical software package. You should also understand how to interpret
the output from a multiple linear regression analysis. This document also
assumes that you are familiar with the statistical assumptions of EFA,
CFA, and SEM, and you are comfortable using syntax-based software programs. If you do not have prior experience with exploratory factor
analysis, we would recommend seeing our Stat Books for
Loan under the section on Factor Analysis and Structural Equation Modeling
for more information about Factor Analysis and SEM. Finally, you should understand
basic Microsoft Windows navigation operations: opening files and folders,
saving your work, recalling previously saved work, etc.
3. Accessing Mplus
You may access Mplus in one of three ways:
Important note: Our Statistical Consulting services are available only to
researchers in the UCLA community. Non-UCLA researchers will find the Muthén
& MuthénWeb
site to be a useful resource; also see the Mplus
Discussion forum for frequently-asked questions and answers. You may
also post your own questions in this forum.
Section 2: Latent Variable Modeling using
Mplus
1. Overview of SEM Assumptions
for Continuous Outcome Data
Before specifying and running a latent variable models, you should give
some thought to the assumptions underlying latent variable modeling with
continuous outcome variables. Several of these assumptions are shown below:
TITLE:
Grant-White School: Summary Statistics
DATA:
FILE IS "c:\intromplus\grant.dat" ;
FORMAT IS free;
VARIABLE:
NAMES ARE visperc cubes lozenges paragrap sentence
wordmean gender ;
USEVARIABLES ARE visperc cubes lozenges paragrap sentence wordmean ;
ANALYSIS:
TYPE = basic ;
In this sample program, the DATA command uses the FILE subcommand
to tell Mplus where to locate the relevant data file. In this case, the
file's location is c:\intromplus\grant.dat. The FORMAT
subcommand uses the default free option to let Mplus know that the
data points appear in order in the data file with the data points separated by
commas, tabs, or spaces.
A similar subcommand, USEOBS, allows you to select subsets of cases to be used in a particular analysis. The example below shows how you could limit the analysis to female participants, selecting just those where gender=1. It also shows how you can use the dash notation to specify a group of variables in the USEVARIABLES statement, indicating all of the variables contigously between visperc to wordmean.
TITLE:
Grant-White School: Summary Statistics
DATA:
FILE IS "c:\intromplus\grant.dat" ;
FORMAT IS free;
VARIABLE:
NAMES ARE visperc cubes lozenges paragrap sentence
wordmean gender ;
USEVARIABLES ARE visperc-wordmean ;
USEOBS gender EQ 1 ;
ANALYSIS:
TYPE = basic ;
The ANALYSIS command specifies the TYPE of analysis to be
performed by Mplus. In this example the type is basic. The basic
model type does not fit any model to the sample data; instead Mplus
will compute sample statistics only. Using basic as the analysis type is useful
during the initial phase of building your command file because you can use the
Mplus sample statistics output to compare Mplus results to results you obtained
using SAS, SPSS, Excel, or other statistical software programs to verify that
Mplus is reading your input data correctly.
Running the program above with the data
grant.dat yields the output from this basic analysis
below. Although Mplus initially returns a copy of the input command file, that
portion of the output has been omitted here in the interest of saving space.
Grant-White School: Summary Statistics
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 145
Number of y-variables 6
Number of x-variables 0
Number of continuous latent variables 0
Observed variables in the analysis
VISPERC CUBES LOZENGES PARAGRAP SENTENCE WORDMEAN
Estimator ML
Information matrix EXPECTED
Maximum number of iterations 1000
Convergence criterion 0.500D-04
Maximum number of steepest descent iterations 20
Input data file(s)
c:\IntroMplus\grant.dat
Input data format FREE
RESULTS FOR BASIC ANALYSIS
SAMPLE STATISTICS
Means
VISPERC CUBES LOZENGES PARAGRAP SENTENCE
________ ________ ________ ________ ________
1 29.579 24.800 15.966 9.952 18.848
Means
WORDMEAN
________
1 17.283
Covariances
VISPERC CUBES LOZENGES PARAGRAP SENTENCE
________ ________ ________ ________ ________
VISPERC 47.801
CUBES 10.012 19.758
LOZENGES 25.798 15.417 69.172
PARAGRAP 7.973 3.421 9.207 11.393
SENTENCE 9.936 3.296 11.092 11.277 21.616
WORDMEAN 17.425 6.876 22.954 19.167 25.321
Covariances
WORDMEAN
________
WORDMEAN 63.163
Correlations
VISPERC CUBES LOZENGES PARAGRAP SENTENCE
________ ________ ________ ________ ________
VISPERC 1.000
CUBES 0.326 1.000
LOZENGES 0.449 0.417 1.000
PARAGRAP 0.342 0.228 0.328 1.000
SENTENCE 0.309 0.159 0.287 0.719 1.000
WORDMEAN 0.317 0.195 0.347 0.714 0.685
Correlations
WORDMEAN
________
WORDMEAN 1.000
Mplus initially identifies the number of groups and observations in the
analysis, followed by the number of X (predictor) and Y (outcome) variables and
the sample (input) covariances, variances, and means. Once you have verified
that these values are correct, you can turn your attention to fitting your
model(s) of interest. The next section continues with the same example data file
, but describes how to perform an exploratory factor analysis of the continuous
variables in the Grant-White data file using Mplus.
Section 4: Exploratory Factor Analysis
1.
Exploratory Factor Analysis with Continuous Variables
Once you have read the data into Mplus and verified that the sample statistics
show that the data have been read correctly, you can perform exploratory factor
analysis using Mplus by altering the ANALYSIS command as follows:
TITLE:
Grant-White School: Summary Statistics
DATA:
FILE IS "c:\intromplus\grant.dat" ;
FORMAT IS free ;
VARIABLE:
NAMES ARE visperc cubes lozenges paragrap sentence
wordmean gender ;
USEVARIABLES ARE visperc cubes lozenges paragrap sentence wordmean ;
ANALYSIS:
TYPE = efa 1 2 ;
ESTIMATOR = ml ;
OUTPUT:
sampstat ;
This syntax instructs Mplus to perform an exploratory factor analysis of
the Grant-White data file. Efa tells Mplus to perform an exploratory
factor analysis. The 1 and 2 following the efa specification
tells Mplus to generate all possible factor solutions between and including
1 and 2. In this instance, one and two factor solutions will be produced
by the analysis. Finally, the ESTIMATOR = ml option has Mplus
use the maximum likelihood estimator to perform the factor analysis and
compute a chi-square goodness of fit test that the number of hypothesized
factors is sufficient to account for the correlations among the six variables
in the analysis. This optional specification overrides the default unweighted
least-square (uls) estimator.
Mplus produces the sample correlations, eigenvalues, and the chi-square
test of the one factor model to the sample data. As you can see from the
results, shown below, the chi-square test is statistically significant,
so the null hypothesis that a single factor fits the data is rejected;
more factors are required to obtain a non-significant chi-square. Since
the chi-square test is sensitive to sample size (such that large samples
often return statistically significant chi-square values) and non-normality
in the input variables, Mplus also provides the Root Mean Square Error
of Approximation (RMSEA) statistic. The RMSEA is not as sensitive
to large sample sizes. According to Hu and Bentler (1999), RMSEA values
below .06 indicate satisfactory model fit. The RMSEA yielded a result of
.162, which was consistent with the chi-square result in suggesting that
the one factor model does not fit the data adequately.
CONTINUOUS VARIABLE CORRELATION MATRIX
VISPERC
CUBES
LOZENGES
PARAGRAP SENTENCE
________
________
________
________ ________
VISPERC
CUBES
.326
LOZENGES
.449 .417
PARAGRAP
.342
.228 .328
SENTENCE
.309
.159
.287 .719
WORDMEAN
.317
.195
.347
.714 .685
EXPLORATORY ANALYSIS WITH 1 FACTOR(S) :
EIGENVALUES FOR SAMPLE CORRELATION MATRIX
1
2
3
4
5
________ ________
________ ________
________
1
3.009
1.225
.656
.530 .311
EIGENVALUES FOR SAMPLE CORRELATION MATRIX
6
________
1 .270
EXPLORATORY ANALYSIS WITH 1 FACTOR(S) :
CHI-SQUARE
VALUE
43.241
DEGREES OF
FREEDOM
9
PROBABILITY
VALUE
.0000
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) :
ESTIMATE (90 PERCENT C.I.) IS .162 ( .115
.212)
PROBABILITY RMSEA LE .05 IS .000
Mplus next produces the estimated factor loadings and error
variances. Notice that the visperc, cubes, and lozenges
factor loadings are low relative to the other factor loadings
displayed below. See Factor
Analysis Using SAS PROC FACTOR (courtesy of
The University of Texas at Austin
Statistical Services) for more information on interpreting factor loadings.
ESTIMATED
FACTOR LOADINGS
1
________
VISPERC
.415
CUBES
.272
LOZENGES
.415
PARAGRAP
.865
SENTENCE
.818
WORDMEAN
.827
ESTIMATED ERROR VARIANCES
VISPERC
CUBES
LOZENGES
PARAGRAP SENTENCE
________
________
________
________ ________
.828
.926
.828
.252 .330
________
1 .316
The estimated correlation matrix is the correlation matrix reproduced
by Mplus under the assumption that a single factor is sufficient to
explain the sample correlations. From the model fit results shown above,
this is not the case, so it is not surprising that this implied or
model-based correlation matrix differs substantially from the sample
correlation matrix reported above.
ESTIMATED CORRELATION MATRIX
VISPERC
CUBES
LOZENGES
PARAGRAP SENTENCE
________
________
________
________ ________
VISPERC 1.000
CUBES
.113 1.000
LOZENGES
.172
.113 1.000
PARAGRAP
.359
.235
.359 1.000
SENTENCE
.339
.223
.340
.708 1.000
WORDMEAN
.343
.225
.343
.715 .677
WORDMEAN
________
WORDMEAN
1.000
The residuals matrix represents the difference between the
sample correlation matrix and the implied correlation matrix. As noted
above, since the model did not fit the observed data particularly well,
there are some values in this matrix that are non-trivial in size. In
particular, the cubes-visperc, lozenges-visperc, and
lozenges-cubes residual values are high relative to the other
values in the matrix.
RESIDUALS OBSERVED-EXPECTED
VISPERC
CUBES
LOZENGES
PARAGRAP SENTENCE
________
________
________
________ ________
VISPERC
.000
CUBES
.213 .000
LOZENGES
.276
.304 .000
PARAGRAP
-.017
-.007
-.031 .000
SENTENCE
-.030
-.063
-.053
.011 .000
WORDMEAN
-.026
-.030
.004
.000 .009
RESIDUALS OBSERVED-EXPECTED
WORDMEAN
________
WORDMEAN .000
The Root Mean Square Residual (RMR) is another descriptive model
fit
statistic. According to Hu and Bentler (1999), RMR values should be below
.08 with lower values indicating better model fit. The value of .1225
shown below for the one factor solution indicates unacceptably poor model
fit.
ROOT MEAN SQUARE RESIDUAL
IS .1225
In
short,
the one factor solution was a poor fit to the data. In particular, the
model did not account well for the correlations among the visperc,
cubes, and lozenges variables. What about the two factor
solution? Mplus reports the two factor solution following the single
factor model.
The chi-square test of model fit is non-significant,
indicating that the null hypothesis that the model fits the data cannot be
rejected (the model fits the data well). This finding is corroborated by
the RMSEA: Its estimate is zero; it's 90% confidence interval has an upper
bound value of .055, which is below the Hu and Bentler (1999) recommended
cutoff value of .06. The RMSEA estimate and its upper bound confidence
interval value should both fall below .06 to ensure satisfactory model
fit.
EXPLORATORY ANALYSIS WITH 2 FACTOR(S) :
EXPLORATORY ANALYSIS WITH 2 FACTOR(S) :
CHI-SQUARE
VALUE
1.079
DEGREES OF
FREEDOM
4
PROBABILITY
VALUE
.8976
RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) :
ESTIMATE (90 PERCENT C.I.) IS .000 ( .000
.055)
PROBABILITY RMSEA LE .05 IS .944
For
exploratory factor analysis solutions with two or more factors, Mplus
reports varimax rotated loadings and promax rotated loadings.Varimax loadings assume the two factors are uncorrelated
whereas promax loadings allow the factors to be correlated. Directly below
the promax loadings is the factor intercorrelatrion matrix.
In this
example the two factors are correlated .480. With even a modest
correlation among the two factors, you should choose to interpret the promax rotated loadings. The loadings show that the visperc, cubes, and lozenges variables load onto the first factor
whereas the remaining variables load onto the second factor.
VARIMAX ROTATED LOADINGS
1
2
________ ________
VISPERC
.547 .250
CUBES
.550 .092
LOZENGES
.728 .196
PARAGRAP
.241 .830
SENTENCE
.174 .816
WORDMEAN
.247 .788
PROMAX ROTATED LOADINGS
1
2
________ ________
VISPERC
.540 .112
CUBES
.585 -.063
LOZENGES
.755 -.001
PARAGRAP
.046 .841
SENTENCE
-.025 .846
WORDMEAN
.063 .794
PROMAX FACTOR CORRELATIONS
1
2
________ ________
1 1.000
2
.480 1.000
Mplus
next reports estimated error variances for each observed variable, the
estimated correlation matrix, and the residual correlation matrix. Notice
that unlike the preceding one factor solution, this dual factor solution's
estimated correlation matrix is very close in value to the original sample
correlation matrix. Accordingly, the residual correlation matrix has all
values close to zero and the RMR value of .0092 is well below the Hu and Bentler (1999) recommended cutoff of .08.
ESTIMATED ERROR VARIANCES
VISPERC
CUBES
LOZENGES PARAGRAP SENTENCE
________
________
________
________ ________
1
.638
.689
.431
.253 .304
ESTIMATED ERROR VARIANCES
WORDMEAN
________
1 .318
ESTIMATED CORRELATION MATRIX
VISPERC
CUBES
LOZENGES PARAGRAP SENTENCE
________
________
________
________ ________
VISPERC 1.000
CUBES
.324 1.000
LOZENGES
.448
.419 1.000
PARAGRAP
.339
.209
.338 1.000
SENTENCE
.299
.170
.286
.719 1.000
WORDMEAN
.332
.208
.334
.714 .686
ESTIMATED CORRELATION MATRIX
WORDMEAN
________
WORDMEAN
1.000
RESIDUALS OBSERVED-EXPECTED
VISPERC
CUBES
LOZENGES PARAGRAP SENTENCE
_______
________
________
________ ________
VISPERC
.000
CUBES
.002 .000
LOZENGES
.001
-.002 .000
PARAGRAP
.002
.019
-.010 .000
SENTENCE
.010
-.011
.000
.000 .000
WORDMEAN
-.015
-.013
.013
.001 -.001
RESIDUALS OBSERVED-EXPECTED
WORDMEAN
________
WORDMEAN .000
ROOT MEAN SQUARE RESIDUAL
IS .0092
This
example assumes that the Grant-White data file is complete. In other words,
there are no missing cases in the Grant-White data file . What if some cases
had missing values? Often data files have cases with incomplete data. The
next section describes a feature unique to Mplus: exploratory factor
analysis of a data file with incomplete cases.
2. Exploratory Factor Analysis
with Missing Data
Suppose you altered the Grant-White data file so
that cases with visperc scores that exceed 34 have missing
cubes scores and that cases with wordmean scores of 10 or
below have missing sentence values. In this instance the missing
cubes and setence completion data are said to be missing at random
(MAR) because the patterns of missing data are explainable by the values
of other variables in the data file , visual perception and word meaning.
Ordinarily, if you do not specify a missing data analysis in Mplus, Mplus
performs listwise or casewise deletion of cases with any
missing data. That is, any case with one or more missing data points is
omitted entirely from analyses. However, for exploratory factor analysis,
confirmatory factor analysis, and structural equation modeling with
continuous variables, Mplus features a missing data option that
outperforms the default listwise deletion method. The optional method that
offers superior performance is called full information maximum likelihood
(FIML); details on FIML can be found in the UT Austin Statistical Services General FAQ #25: Handling missing or incomplete Data.
Regardless of whether you choose to use FIML or listwise data deletion
to handle missing data, if you have missing data in your input data file ,
you must tell Mplus how the missing values for each variable are
represented in the data file . You use the MISSING subcommand of the VARIABLE command to accomplish this task. In this example, missing
values for cubes and sentence are represented by -9, so the MISSING subcommand reads:
The all keyword tells Mplus that all variables in the analysis use -9 to represent missing values. If your data file contains blanks to represent missing values, you may use the specificationMISSING ARE all (-9) ;
Similarly, you may useMISSING = blank ;
if your data file contains period symbols to represent missing values. Other missing value specifications are available; see the Mplus User's Guide for specifics.MISSING ARE . ;
TITLE: Grant-White School: EFA with Missing Data DATA: FILE IS "c:\intromplus\grant-missing.dat" ; VARIABLE: NAMES ARE cubes lozenges paragrap sentence wordmean gender ; USEVARIABLES ARE visperc - wordmean; MISSING ARE all (-9) ; ANALYSIS: TYPE = efa 1 2; ESTIMATOR = ml ;
Selected output from the analysis appears below.
Grant-White School: Exploratory Factor Analysis with Missing
Data
SUMMARY OF ANALYSIS
Number of
groups
1
Number of
observations
79
Number of
y-variables
6
Number of
x-variables
0
Number of continuous latent
variables
0
Notice that Mplus considers the data file to contain 79 usable cases rather than the original 145 cases.
EXPLORATORY ANALYSIS WITH 1 FACTOR(S) :TITLE: Grant-White School: EFA with Missing Data DATA: FILE IS "c:\intromplus\grant-missing.dat" ; VARIABLE: NAMES ARE cubes lozenges paragrap sentence wordmean gender ; USEVARIABLES ARE visperc - wordmean; MISSING ARE all (-9) ; ANALYSIS: TYPE = missing efa 1 2 ; ESTIMATOR = ml ;Run the analysis and consider the results, shown below.
First, you must change the names of the variables in the NAMES and USEVARIABLES subcommands of the DATA command. Next, you tell Mplus which variables are categorical with the CATEGORICAL subcommand of the DATA command, like this:TITLE: Grant-White School: EFA with categorical outcomes DATA: FILE IS "a:\grantcat.dat" ; VARIABLE: NAMES ARE viscat cubescat lozcat paracat sentcat wordcat ; USEVARIABLES ARE viscat - wordcat ; CATEGORICAL ARE viscat - wordcat ; ANALYSIS: TYPE = efa 1 2; ESTIMATOR = wlsmv ; OUTPUT: sampstat ;
CATEGORICAL ARE vizcat - wordcat ;
You should also change the ESTIMATOR option for the ANALYSIS command. The default is unweighted least-squares (uls), which is fast and is useful for exploratory work, but a more optimal choice for categorical outcomes, based on the work of Muthén, DuToit, and Spisic (1997), is weighted least-squares with mean and variance adjustment, wlsmv.
ANALYSIS:
TYPE = efa 1 2;
ESTIMATOR = wlsmv ;
Selected output from the analysis
appears below. Notice that the categorical nature of the data precludes
computation of the descriptive model fit statistics such as the RMSEA,
though Mplus does produce the familiar chi-square test of overall model
fit.
EXPLORATORY ANALYSIS WITH 2 FACTOR(S) :
CHI-SQUARE
VALUE
2.823
DEGREES OF
FREEDOM
4
PROBABILITY
VALUE
.5875
The chi-square result for the two factor model is not
significant, which indicates that two factors are sufficient to explain
the intercorrelations among the six observed variables. The varimax and
promax rotated factor loadings appear below. The pattern and values
obtained from this analysis are consistent with the results of the first
exploratory factor analysis of the completely continuous data discussed
previously.
VARIMAX ROTATED LOADINGS
1
2
________ ________
VISCAT
.571 .332
CUBESCAT
.700 .117
LOZCAT
.667 .244
PARACAT
.473 .642
SENTCAT
.235 .847
WORDCAT
.206 .858
PROMAX
ROTATED LOADINGS
1
2
________ ________
VISCAT
.559 .159
CUBESCAT
.777 -.137
LOZCAT
.698 .022
PARACAT
.347 .550
SENTCAT
.005 .876
WORDCAT
-.031 .899
PROMAX
FACTOR CORRELATIONS
1
2
________ ________
1 1.000
2
.557 1.000
Although Mplus does not produce the RMSEA descriptive model fit
statistic for categorical outcomes, it does output the standardized root
mean residual, RMR:
ROOT MEAN SQUARE RESIDUAL
IS .0310
The value
of .031 suggests an excellent fit of the two factor model to the observed
data. (Please note that as of version 4.2, Mplus does give the RMSEA.)
There are several notes worth keeping in mind when you perform exploratory factor analysis with categorical outcome variables.
TITLE:
Grant-White School: Summary Statistics
DATA:
FILE IS "c:\intromplus\grant.dat" ;
FORMAT IS free ;
VARIABLE:
NAMES ARE visperc cubes lozenges paragrap sentence
wordmean gender ;
USEVARIABLES ARE visperc cubes lozenges paragrap sentence wordmean ;
ANALYSIS:
TYPE = general ;
MODEL:
visual BY visperc@1 cubes lozenges ;
verbal BY paragrap@1 sentence wordmean ;
visual WITH verbal ;
OUTPUT:
standardized sampstat ;
The general analysis type tells Mplus that you are fitting
a general structural equation model rather than specific model such as
an exploratory factor analysis. The model is general in the sense that
you must define what parameters are estimated; all other parameters are
assumed to be fixed. In the exploratory factor analysis context, Mplus
already knows the specifics of that model, so specifying the model is handled
automatically by Mplus. By contrast, in the confirmatory factor analysis
and structural equation modeling context each hypothesized model is unique,
so you must tell Mplus how the model is constructed. The MODEL command
allows you to specify the parameters of your model.
The first line of the MODEL command shown above defines a latent
factor called visual. The BY keyword (an abbreviation for
"measured by") is used to define the latent variables; the latent variable
name appears on the left-hand side of the BY keyword whereas the
measured variables appear on the right-hand side of the BY keyword.
It has three observed indicator variables: visperc, cubes,
and lozenges. Similarly, in the second line of the MODEL command a latent factor called verbal has three indicators: paragrap, sentence,
and wordmean. The third line of MODEL command uses the WITH keyword to correlate the visual latent factor with the verbal
latent factor.
The visperc and paragrap variables are each followed by @1.
The @ sign tells Mplus to fix the factor loading (regression weight)
of the visual-visperc relationship to the value that follows the @,
1.00. Similarly, the verbal-paragrap relationship is also fixed
to 1.00. The reason you fix these two parameters is to provide a scale
for the visual and verbal latent variables' variances. If you ever need
to supply starting values for a particular parameter in Mplus, you can
specify its number after an asterisk, like this: sentence*.5. Omitting
the asterisks when you do not specify starting values is the default. Note
that each variable is separated from the other variables in the analysis
by at least one space.
Finally, the OUTPUT command contains an added keyword, standardized.
This option instructs Mplus to output standardized parameter estimate values
in addition to the default unstandardized values. Selected output from
the analysis appears below.
Grant-White School: Confirmatory Factor Analysis
SUMMARY OF ANALYSIS
Number of groups
1
Number of observations
145
Number of y-variables
6
Number of x-variables
0
Number of continuous latent variables
2
Observed variables in the analysis
VISPERC CUBES
LOZENGES PARAGRAP SENTENCE WORDMEAN
Continuous latent variables in the analysis
VISUAL VERBAL
The summary of analysis information tells you that there are six continuous
observed variables in the analysis and two latent factors, visual
and verbal. Mplus then displays the input covariance matrix generated
from the six observed variables:
SAMPLE STATISTICS
Covariances/Correlations/Residual
Correlations
VISPERC CUBES
LOZENGES PARAGRAP
SENTENCE
________ ________
________ ________
________
VISPERC 47.801
CUBES 10.012
19.758
LOZENGES 25.798
15.417 69.172
PARAGRAP 7.973
3.421 9.207
11.393
SENTENCE 9.936
3.296 11.092
11.277 21.616
WORDMEAN 17.425
6.876 22.954
19.167 25.321
Covariances/Correlations/Residual
Correlations
WORDMEAN
________
WORDMEAN 63.163
Mplus next reports the results of fitting the hypothesized model to
the sample data.
THE MODEL ESTIMATION TERMINATED NORMALLY
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
3.663
Degrees
of Freedom
8
P-Value
.8861
Loglikelihood
H0 Value
-2575.128
H1 Value
-2573.297
Information Criteria
Number of
Free Parameters
13
Akaike (AIC)
5176.256
Bayesian
(BIC)
5214.954
Sample-Size
Adjusted BIC 5173.817
(n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate
.000
90 Percent
C.I.
.000 .046
Probability RMSEA <= .05 .957
As was the case for the exploratory factor analysis of these data, Mplus
reports the chi-square goodness-of-fit test and the RMSEA descriptive model
fit statistic. The chi-square test of model fit is not significant and
the RMSEA value is well below the value of .06 recommended by Hu and Bentler (1999) as an upper boundary, so you can conclude that the proposed model
fits the data well. Mplus also reports the Akaike Information Criterion
(AIC) and the Bayesian Information Criterion (BIC). These are descriptive
indexes of model fit that you can use to compare the goodness of model
fit of two or more competing models. Smaller values indicate better model
fit.
Mplus also outputs the unstandardized coefficients (Estimates
in the output), the standard errors (abbreviated S.E. in the output),
the estimates divided by their respective standard errors (Est./S.E.),
and two standardized coefficients for each estimated parameter in the model
(Std and StdYX). The estimate divided by the standard error
tests the null hypothesis that the parameter estimate is zero in the population
from which you drew your sample. An unstandardized estimate divided by
its standard error may be evaluated as a Z statistic, so values
that exceed +1.96 or fall below -1.96 are significant below p =
.05.
MODEL RESULTS
Estimates S.E. Est./S.E.
Std StdYX
VISUAL BY
VISPERC
1.000 .000
.000 4.358 .632
CUBES
.542 .116 4.658
2.360 .533
LOZENGES
1.392 .272 5.112
6.064 .732
VERBAL BY
PARAGRAP
1.000 .000
.000 2.920 .868
SENTENCE
1.309 .115 11.352
3.821 .825
WORDMEAN
2.247 .197 11.402
6.560 .828
VISUAL WITH
VERBAL
6.784 1.720 3.943
.533 .533
In this example, each of the estimated parameters has an estimate to
standard error ratio greater than +1.96, so each factor loading is statistically
significant, as well as the correlation between the visual and verbal
latent factors (Z = 3.943). The variance components of the two factors,
shown in the output appearing below, are also statistically significant,
indicating that the amount of variance accounted for by each factor is
significantly different from zero.
Each unstandardized estimate represents the amount of change in the outcome
variable as a function of a single unit change in the variable causing
it. In this example, you assume that the latent variables, in addition
to some measurement error (shown below), are responsible for the scores
on the six observed variables. For instance, for each single unit change
in the verbal latent factor, sentence scores increase by
1.309 units.
Different measures often have different scales, so you will often find
it useful to examine the standardized coefficients when you want to compare
the relative strength of associations across observed variables that are
measured on different scales. Mplus provides two standardized coefficients.
The first, labeled Std on the output, standardizes using the latent
variables' variances whereas the second type of standardized coefficient, StdYX,
standardizes based on latent and observed variables' variances. This standardized
coefficient represents the amount of change in an outcome variable per
standard deviation unit of a predictor variable. In this output, you can
see clearly that the standardized coefficients of paragrap,
sentence,
and wordmean are larger than those of visperc,
cubes,
and lozenges. This finding suggests that the verbal latent
factor does a better job at explaining the shared variance among paragrap,
sentence,
and wordmean than does the visual
latent factor for its three
indicator variables, visperc, cubes, and lozenges.
This assertion is corroborated by the residual variances output by Mplus.
The standardized coefficients for the first three indicators are larger
than those for the remaining three indicators.
Residual Variances
Grant-White School: Confirmatory Factor Analysis
Estimates S.E. Est./S.E.
Std StdYX
VISPERC
28.485 4.739 6.011
28.485 .600
CUBES
14.050 1.978 7.105
14.050 .716
LOZENGES
31.933 7.269 4.393
31.933 .465
PARAGRAP
2.791 .584 4.775
2.791 .247
SENTENCE
6.869 1.164 5.900
6.869 .320
WORDMEAN
19.695 3.385 5.819
19.695 .314
Variances
VISUAL
18.989 5.582 3.402
1.000 1.000
VERBAL
8.525 1.376 6.196
1.000 1.000
R-SQUARE
Observed
Variable R-Square
VISPERC .400
CUBES
.284
LOZENGES .535
PARAGRAP .753
SENTENCE .680
WORDMEAN .686
Finally, the r-square output illustrates that only modest amounts of
variance are accounted for in the first three indicators whereas much larger
amounts of variance are accounted for in the final three indicators. As
is the case with exploratory factor analysis of continuous outcome variables,
you may want to use the mlm or mlmv estimators
in lieu of the default ml estimator if your input data are
not distributed joint multivariate normal by using the ESTIMATOR = option on the ANALYSIS command. The mlm option provides
a mean-adjusted chi-square model test statistic whereas the mlmv option produces a mean and variance adjusted chi-square test of model fit;
both options also induce Mplus to produce robust standard errors displayed
in the model results table that are used to compute Z tests of significance
for individual parameter estimates. An added advantage of the mlm option is that its chi-square test and standard errors are equivalent to
those produced by EQS in its ML;ROBUST method. Muthén and Muthén have placed formulas
on their Web site that allow you to use mlm-produced
chi-square values in nested model comparisons.
2. Handling Missing Data
It is often the case that you have missing data in the context of confirmatory
factor analysis and structural equation modeling. Using Mplus, you can
employ the optimal Full Information Maximum Likelihood (FIML) approach
to handling missing data that was described above in the section Exploratory
Factor Analysis with Missing Data in Section 4.
Consider once again the same modified data file, grant-missing.dat,
containing incomplete cases that was used in the earlier exploratory factor
analysis with missing data. As in the previous example, define the missing
value code to be -9 for all variables using the MISSING subcommand
in the VARIABLE command, copy the MODEL syntax from the previous
confirmatory factor analysis example into the Mplus input window, and then
modify the ANALYSIS command so that it reads as follows (with the changed
part in italics for emphasis).
TITLE: Grant-White School: CFA with missing data DATA: FILE IS "c:\intromplus\grant-missing.dat" ; VARIABLE: NAMES ARE visperc cubes lozenges paragrap sentence wordmean gender ; USEVARIABLES ARE visperc - wordmean ; MISSING ARE all (-9) ; ANALYSIS: TYPE = general missing h1 ; MODEL: visual BY visperc@1 cubes lozenges ; verbal BY paragrap@1 sentence wordmean ; visual WITH verbal ; OUTPUT: standardized sampstat ;
The missing keyword alerts Mplus to activate the FIML missing data handling feature. The additional h1 keyword
tells Mplus to output the chi-square goodness-of-fit test in addition to
the typical summary statistics, missing data pattern information, parameter
estimates, and standard errors obtained in an analysis. Mplus requires
that you specify the h1 keyword because large models with
many missing data patterns can take a long time to converge. If this describes
your situation, you may want to omit the h1 option on the TYPE
= line to verify that you have specified your model correctly
before
invoking the h1 option to produce the chi-square test of
model fit. If you elect to remove the h1 option from the ANALYSIS
TYPE = command, be sure to omit the sampstat option from
the OUTPUT line, as well. If sampstat is included
on the OUTPUT line, Mplus automatically assumes the h1ANALYSIS option and computes the chi-square test of model fit, even if h1 is not included on the ANALYSIS TYPE = line.
The chi-square test of model fit for the confirmatory factor analysis
with missing data shows that the hypothesized model fit the data well:
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
2.777
Degrees
of Freedom
8
P-Value
.9476
Loglikelihood
H0 Value
-2376.312
H1 Value
-2374.923
Information Criteria
Number of
Free Parameters
19
Akaike (AIC)
4790.623
Bayesian
(BIC)
4847.181
Sample-Size
Adjusted BIC 4787.058
(n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate
.000
90 Percent
C.I.
.000 .011
Probability RMSEA <= .05 .982
The Mplus parameter estimates, standard errors, and standardized parameter
estimates are similar to those found in the preceding confirmatory factor
analysis example. The only substantial difference is the inclusion of an
additional section that contains means and intercepts for the latent factors
and observed variables. These means and intercepts are required to be estimated
by the FIML missing data handling procedure, but are otherwise not a part
of the tested model.
MODEL RESULTS
Estimates S.E. Est./S.E.
Std StdYX
VISUAL BY
VISPERC
1.000 .000
.000 4.377 .635
CUBES
.469 .127 3.679
2.051 .473
LOZENGES
1.373 .294 4.673
6.010 .725
VERBAL BY
PARAGRAP
1.000 .000
.000 2.914 .866
SENTENCE
1.187 .114 10.376
3.460 .821
WORDMEAN
2.247 .206 10.888
6.547 .827
VISUAL WITH
VERBAL
7.014 1.800 3.896
.550 .550
Residual Variances
VISPERC
28.354 5.037 5.629
28.354 .597
CUBES
14.589 2.340 6.234
14.589 .776
LOZENGES
32.642 7.938 4.112
32.642 .475
PARAGRAP
2.824 .627 4.507
2.824 .250
SENTENCE
5.781 1.070 5.401
5.781 .326
WORDMEAN
19.872 3.578 5.554
19.872 .317
Variances
VISUAL
19.158 5.859 3.270
1.000 1.000
VERBAL
8.493 1.393 6.099
1.000 1.000
Intercepts
VISPERC
29.579 .572 51.673
29.579 4.291
CUBES
24.616 .421 58.431
24.616 5.678
LOZENGES
15.965 .689 23.184
15.965 1.925
PARAGRAP
9.952 .279 35.620
9.952 2.958
SENTENCE
19.054 .366 52.057
19.054 4.522
WORDMEAN
17.283 .658 26.274
17.283 2.182
Finally, Mplus produces the r-square values for the observed variables.
Once again, these are similar to those obtained from the original data file
with complete cases.
R-SQUARE
Observed
Variable R-Square
VISPERC .403
CUBES
.224
LOZENGES .525
PARAGRAP .750
SENTENCE .674
WORDMEAN .683
If you elect to use Mplus's FIML approach to handling missing data,
be aware that the only available estimator is the maximum likelihood option, ml.
If you suspect that your data are non-normally distributed, remember that
the chi-square test of model fit may be affected by the non-normality problem.
Depending on the severity of the non-normality problem and the amount of
missing data you have, you may want to explore other ways of handling the
missing data problem prior to performing analyses using Mplus; see see the UT
Austin Statistical Services General
FAQ #25: Handling missing or incomplete data.
3.
Confirmatory Factor Analysis with Categorical Outcomes
Confirmatory factor analysis with dichotomous and polytomous categorical
outcomes, or confirmatory factor analysis with mixed categorical and continuous
outcomes is also possible using Mplus. Recall the grantcat.dat data file used in the example Exploratory
Factor Analysis with Categorical Outcomes in Section
4. Using the same data file that replaces the six continuous observed variables
with a dichotomous variables, you can use the confirmatory factor analysis
syntax from the example Confirmatory
Factor Analysis With Continuous Variables with the following modifications.
First, add the CATEGORICAL ARE vizcat ... wordcat ; statement
to the DATA command. Mplus will now treat the six observed variables
as categorical in the analysis. The entire command syntax is shown here.
TITLE: Grant-White School: CFA with categorical outcomes DATA: FILE IS "c:\intromplus\grantcat.dat" ; VARIABLE: NAMES ARE viscat cubescat lozcat paracat sentcat wordcat ; USEVARIABLES ARE viscat - wordcat ; CATEGORICAL ARE viscat - wordcat ; ANALYSIS: TYPE = general ; MODEL: visual BY viscat@1 cubescat lozcat ; verbal BY paracat@1 sentcat wordcat ; visual WITH verbal ; OUTPUT: sampstat standardized ;
Selected results from the analysis appear below.
Chi-Square Test of Model Fit
Value
7.463*
Degrees
of Freedom
6**
P-Value
.2800
* The chi-square value for MLM, MLMV, WLSM and WLSMV cannot
be used for
chi-square difference tests.
** The degrees of freedom for MLMV and WLSMV are estimated according
to
formula 109 (page 281) in the Mplus User's Guide.
The chi-square test of model fit is once again non-significant, suggesting
that the specified model fits the data adequately. The default estimator
for models that contain categorical outcomes is the mean and variance-adjusted
weighted least-squares method, wlsmv. Optional estimators
you may choose are weighted least-squares (wls) and mean-adjusted
weighted least-squares (wlsm). As is the case in the exploratory
factor analysis of categorical data example, there are no descriptive model
fit statistics produced by Mplus when it analyzes categorical outcomes.
Mplus also produces a note alerting you not to use the MLMV, WLSM, and
WLSMV chi-square values in nested model comparisons (the warning about
the MLM chi-square is not relevant as long as you use the formulas
shown on the Mplus Web site for nested model MLM chi-square comparisons
when you use the MLM estimator in the analysis of continuous outcomes).
You should not use the MLM estimator for the analysis of intrinsically
categorical outcome variables.
Mplus then outputs the model results:
MODEL RESULTS
Estimates S.E. Est./S.E.
Std StdYX
VISUAL BY
VISCAT
1.000 .000
.000 .729 .729
CUBESCAT
.831 .212 3.922
.606 .606
LOZCAT
.975 .230 4.248
.710 .710
VERBAL BY
PARACAT
1.000 .000
.000 .814 .814
SENTCAT
1.058 .134 7.920
.861 .861
WORDCAT
1.038 .127 8.154
.844 .844
VISUAL WITH
VERBAL
.397 .087 4.592
.670 .670
Variances
VISUAL
.531 .162 3.273
1.000 1.000
VERBAL
.662 .117 5.661
1.000 1.000
Thresholds
VISCAT$1
.095 .104 .913
.095 .095
CUBESCAT$1
.271 .105 2.571
.271 .271
LOZCAT$1
-.043 .104 -.415
-.043 -.043
PARACAT$1
.009 .104 .083
.009 .009
SENTCAT$1
.183 .105 1.743
.183 .183
WORDCAT$1
.043 .104 .415
.043 .043
This output is similar to that of a confirmatory factor analysis with
continuous outcomes, with one notable exception: Mplus now produces
threshold information for each categorical variable. A threshold is
the expected value of the latent variable or factor at which an individual
transitions from a value of 0 to a value of 1.00 on the categorical outcome
variable when the continuous underlying latent variable's score is zero.
There are only two categorical values for each outcome variable, so there
is only one threshold per variable. For any categorical outcome variable
with K levels, Mplus will output
K-1 threshold values. For
example, a five-point Likert scale item would contain four threshold values.
The first threshold would represent the expected value at which an individual
would be most likely to transition from a value of 0 to a value of 1.00
on the Likert outcome variable. The second threshold would represent the
expected value at which an individual would be most likely to transition
from a value of 1.00 to a value of 2.00 on the outcome variable, and so
on through the fourth threshold, which represents the expected value at
which an individual would transition from 3.00 to 4.00 on the outcome variable.
Finally, Mplus produces the r-square table output. The r-square values
are computed for the continuous latent variables underlying the categorical
outcome variables rather than the actual outcome variables as is the case
in analyses that contain continuous outcome variables. Note that the r-square
values for the categorical outcomes cannot be interpreted as the proportion
of variance explained as is the case in the analysis of continuous outcomes.
Therefore, examining the sign and significance of the estimated coefficients
shown in the model results table above is generally more informative than
interpreting r-square values.
R-SQUARE
Observed Residual
Variable Variance R-Square
VISCAT
.469 .531
CUBESCAT .633
.367
LOZCAT
.495 .505
PARACAT
.338 .662
SENTCAT
.259 .741
WORDCAT
.287 .713
The r-square table's residual variance output is, however, useful for
computing expected probabilities. You can use threshold and coefficient
information shown above with the residual variance information from the
r-square table to compute the expected probability of case having a value
of 0 or 1.00. Consider following formula for computing the
conditional probability of a Y = 0 response given the factor eta.:
P(Y_ij
= 0|eta_ij) = F[(tau_j - lambda_j*eta_i )*(1/square root of theta_jj)]
where:
eta
is the factor's value
F
is the culmulative normal distribution fuction
tau
is
the measured item's threshold
lambda
is the item's factor loading
theta
is the residual variance of the measured item
Suppose you want to obtain the estimated probability for sentcat
= 0 at eta = 0. Using the formula, shown above, you can compute
this value:
P(Y_ij|eta_ij)
= F[(.183 - 0)*(1/square root of .259)]
= F[.183*1.9649437]
= F[.3595847]
You can look up the value of .3595847 in a Z table in a statistics
textbook, or you can supply the computed value of .3595847 to the PROBNORM function in SAS to obtain the correct probability value. The PROBNORM function returns the value from a cumulative normal distribution for the
inputted value. A simple SAS program such as the one shown below enables
you to obtain the final expected probability value of .64.
DATA one ; p = PROBNORM(.3595847) ; RUN ; PROC PRINT DATA = one ; RUN ;
You may substitute other values of eta and lambda to obtain different expected probability values. In general, the same cautions and limitations that were discussed above in the section Exploratory Factor Analysis with Categorical Variables section also apply to the analysis of categorical outcomes in the confirmatory factor analysis and structural equation modeling contexts. In addition, the following point is worth considering:
Educ - Education level
SEI - Socioeconomic index
Anomia67 - Anomie in 1967
Anomia71 - Anomie in 1971
Powles67 - Powerlessness in 1967
Powles71 - Powerlessness in 1971
One of the fitted structural equation models features a latent factor,
SES,
that influences Educ and SEI scores. The SES latent variable in turn influences
two additional latent variables: Alien67 and Alien71. Alien67 represents
self-perceived alienation in 1967 and it influences responses on the anomie and
powerlessness variables measured in 1967. Similarly, Alien71 represents
self-perceived alienation in 1971 and it influences responses on the anomie and
powerlessness variables measured in 1971. SES influences both Alien67 and
Alien71 and Alien67 also influences Alien71.
The dataset,
wheaton-generated.dat,
is used in the analysis that follows:
TITLE: Wheaton et al. Example 1: Full SEM DATA: FILE IS "c:\intromplus\wheaton-generated.dat" ; VARIABLE: NAMES ARE educ sei anomia67 powles67 anomia71 powles71 ; USEVARIABLES ARE educ - powles71 ; ANALYSIS: TYPE = general ; MODEL: ses BY educ@1 sei ; alien67 BY anomia67@1 powles67 ; alien71 BY anomia71@1 powles71 ; alien67 ON ses ; alien71 ON ses alien67 ; OUTPUT: standardized sampstat ;
The syntax for this analysis is similar to that of the confirmatory factor analysis example shown in subsection 1 above. The only noteworthy difference is the use of the ON keyword in the MODEL command to specify the regression relationships among the latent variables; the WITH keyword is used to specify correlations or covariances among variables. In th