UCLA Academic Technology Services HomeServicesClassesContactJobs
Stat Computing > Seminars > Building Your Mplus Skills
Search

Statistical Computing Seminars
Building Your Mplus Skills

This seminar is a continuation of our introduction to Mplus seminar. We will review the basics of Mplus syntax and show some examples for simple analyses, such as regression models for continuous and binary variables. Then we'll move on to more advanced models, such as factor analysis, path analysis, growth curve models and latent class models. Some of the examples will be demonstrated by running Mplus in real time. The data files and the input files are zipped for an easy download and can be accessed by following the link.

Introduction

"We started to develop Mplus eleven years ago with the goal of providing applied researchers with powerful new statistical modeling techniques. We saw a wide gap between new statistical methods presented in the statistical literature and the statistical methods used by researchers in applied papers. Our goal was to help bridge this gap with easy-to-use but powerful software." -- February 2006, Preface to the Mplus User's Guide.

Mplus has been very successful in achieving their goal and has been improving constantly ever since it was first released in 1998. Its general framework of continuous and categorical latent variables gives us a new framework to formulate statistical models. For example, not only we can perform growth curve analysis, but also latent class growth analysis; not only we can do discrete-time survival analysis, but also discrete-time survival mixture analysis. The possibilities of different ways of modeling make Mplus a very attractive piece of software. It offers several options to deal with the missing data issue, including maximum likelihood estimation and estimation based on the multiple imputed data sets.

Over the years, we have recommended to our clients the "get in and get out" approach with Mplus (and some other statistical packages) and it seems to us that this approach has worked well. This approach consists of a few steps: deciding the appropriate models for the study; deciding if switching to Mplus is necessary; preparing the data structure for Mplus using a familiar software package; and moving to Mplus and performing the analyses. 

Our goal for this seminar is to help the transition process to Mplus. We will discuss the overall structure and syntax of Mplus input files. We will also discuss the usage of the Mplus 4 User's Guide and the online resources for Mplus. Starting with some basic models, we will transit to some more advanced models.

Overall structure of Mplus input file

An input file defines the data set to use and the model to run. It is similar to a SAS program file, an SPSS syntax file and a Stata .do file. Below is an example of an input file. It is here to show the general structure of an input file. We are not going to explain what analysis it does.

Data:
      File is d:\work\data\raw\table3_4.dat ;
    Variable:
      names are a b c d freq;
      missing are all (-9999) ;
      usevariables are a b c d;
      weight is freq ; !default is frequency weight
      categorical are a b c d;
      classes = cl(2);
    Analysis:
      Type = mixture ;
      starts = 0;
    Model:
       %overall%
       [a$1*10 b$1*10 c$1*10 d$1*10] (1);
       %cl#1%
       [a$1*-10 b$1*-10 c$1*-10 d$1*-10] (2);
   plot:
      type= plot3;
      series is a(1) b(2) c(3) d(4);
Here are some characteristics of an input file:

Here are some characteristics of a data file:

Overall review of Mplus syntax for the model command

Mplus has made a great effort to make the syntax as simple as possible. Since there are so many analyses that Mplus can perform, the model command can still get really involved. We have compiled a short list here for commonly used keywords.

Use of User's Guide and online resources

The Mplus User's Guide is an excellent reference both for Mplus syntax and for types of models possible in Mplus.  It has the flavor of learning by doing. Its organization is very different from other user guides, such as that of Stata, SAS or SPSS. Examples for basic models can be found in the first chapter, and more advanced models are divided into later chapters. The section on syntax is near the end. A very important feature is that almost all of the examples in the Guide are included with the software itself. If one sees an interesting example, one can always run the model to see the output and to modify the example to suit one's own modeling need. An equally important feature is that each example in the book has a counterpart of Monte Carlo simulation.  In fact, the Monte Carlo simulation has been used for generating most of the data sets used in the User's Guide. The help system of Mplus has A SUMMARY OF THE Mplus LANGUAGE for a quick reference.

The Mplus website has tremendous resources, with a very active discussion group on many topics for serious modelers and the website has many examples one can download. One can get access to the entire User's Guide in PDF format from Mplus' website. One can search the entire Mplus User's Guide for examples and commands. It is a great place to learn new modeling possibilities and to learn Mplus language as well.

Post estimation

Mplus has three commands for post estimation. The output command, the savedata command and the plot command. The output command is used for requesting types of output to be included in the output file. For example, we can request sample statistics to be displayed by using the option sampstat in the output command. The savedata command is used for creating an ASCII data file for further data analysis. The plot command is needed for requesting plots. Mplus offers many model related plots and the controls over the plots are easy to use.

Simple examples

We will review how some simple models are done in Mplus. We will start with linear regression and then discuss models with binary outcomes.

Example 1. Where is the output for intercept? (linear regression)

The code below is for a simple linear regression with the dependent variable write regressed on the predictor variables female and read. So we use the keyword on in the model statement.

  Data:
    File is hsb2.dat ;
  Variable:
    Names are
       id female race ses schtyp prog read write math science socst;
    Missing are all (-9999) ;
    usevariables are female write read;
  Model:
    write on female read;

MODEL RESULTS
                   Estimates     S.E.  Est./S.E.
 WRITE    ON
    FEMALE             5.487    1.007      5.451
    READ               0.566    0.049     11.546
 Residual Variances
    WRITE             50.113    5.011     10.000

Notice that something is missing in the output. Yes, the intercept is missing. What does it mean? Be default, Mplus performs an analysis of covariance. To understand what it is doing, let's perform this analysis manually in the fashion of covariance analysis. We create the covariance matrix for the variables write, female and read, and use this covariance matrix as the input for our analysis.

  Title: example of using covariance matrix.
         input data is an matrix:
         89.8436
         1.21369  .249221
         57.9967 -.271709  105.123;
  Data: file is cov.dat;
        type is covariance;
        nobservations = 200;
  Variable: names are write female read;
  Model: write on female read;

                   Estimates     S.E.  Est./S.E.

 WRITE    ON
    FEMALE             5.487    1.007      5.451
    READ               0.566    0.049     11.546

 Residual Variances
    WRITE             50.113    5.011     10.000

That shows that the analysis we did at the beginning of this example is just an analysis of covariance. In order to estimate the intercept, which is the expected mean holding values of predictor variables at zero, we need to tell Mplus that we are also interested in the analysis of means. This can be done easily by adding type = meanstructure to the analysis command. Every model has an analysis command associated with it. In this example, we don't see the analysis command because we are using the default setting. The default setting is analysis: type = general. Models that can be estimated using type=general include regression analysis, path analysis, confirmatory factor analysis, structural equation modeling and growth curve modeling. Within any specific analysis setting, we can add more options, such as type = missing when the data set has missing values, and we don't want to do listwise deletion. Or we can add type =meanstructure to have the mean or intercept displayed in the output window as we are going to do here.

  Data:
    File is hsb2.dat ;
  Variable:
    Names are
       id female race ses schtyp prog read write math science socst;
    Missing are all (-9999) ;
    usevariables are female write read;
  Analysis:
    type=meanstructure;
  Model:
    write on female read;
                   Estimates     S.E.  Est./S.E.
 WRITE    ON
    FEMALE             5.487    1.007      5.451
    READ               0.566    0.049     11.546
 Intercepts
    WRITE             20.228    2.693      7.511
 Residual Variances
    WRITE             50.113    5.011     10.000


Example 2. Is it a probit or a logit regression? (binary outcome)

Now let's switch to binary outcomes. Using the same data set as in previous example, we create a new dichotomous variable called hon based on the variable write. We also declare that the new variable hon is a categorical variable. As we have mentioned before, the keyword categorical is for outcome variables only. If we have categorical variables as predictors, we have to make sure the dummy variables have been created for them (usually in another software package before the data are moved into Mplus).

  Data:
    File is hsb2.dat ;
  Variable:
    Names are
       id female race ses schtyp prog read write math science socst;
    Missing are all (-9999) ;
    usevariables are female math read hon;
    categorical is hon;
    define: hon = (write>60);
  Model:
    hon on female math read;
Observed dependent variables
  Binary and ordered categorical (ordinal)
   HON
Observed independent variables
   FEMALE      MATH        READ
Estimator                                                    WLSMV
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Parameterization                                             DELTA
Input data file(s)
  hsb2.dat
Input data format  FREE
SUMMARY OF CATEGORICAL DATA PROPORTIONS
    HON
      Category 1    0.755
      Category 2    0.245
(output omitted...)

MODEL RESULTS
                   Estimates     S.E.  Est./S.E.
 HON      ON
    FEMALE             0.574    0.246      2.335
    MATH               0.069    0.016      4.324
    READ               0.038    0.017      2.275
R-SQUARE
    Observed  Residual
    Variable  Variance  R-Square
    HON          1.000     0.489

Now, is this a probit model or a logit model? Mplus is not very explicit about it. By default, it is a probit model. In case we don't know the default, we can still tell that this is a probit model since it has an output section on R-square with residual variance of 1. This is what probit models assume. It assumes that the residual variance follows the standard normal distribution. Now did we miss something again? Yes. We don't see the intercept. This is the exact same situation as we had with the linear regression. Adding type=meanstructure will give us the intercept, which Mplus calls "threshold".

  Data:
    File is hsb2.dat ;
  Variable:
    Names are id female race ses schtyp prog 
              read write math science socst;
    Missing are all (-9999) ;
    Usevariables are female math read hon;
    Categorical is hon;
    Define: hon = (write>60);
  Analysis: type=meanstructure;
  Model:
      hon on female math read;

MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

 HON      ON
    FEMALE             0.574    0.246      2.335
    MATH               0.069    0.016      4.324
    READ               0.038    0.017      2.275

 Thresholds
    HON$1              6.887    1.063      6.482

R-SQUARE

    Observed  Residual
    Variable  Variance  R-Square

    HON          1.000     0.489

What about a logistic regression with the same data? To do a logistic regression, we will change the estimation method from the default method of WLSMV to ML.

  Data:
    File is hsb2.dat ;
  Variable:
    Names are id female race ses schtyp prog 
              read write math science socst;
    Missing are all (-9999) ;
    Usevariables are female math read hon;
    Categorical is hon;
    Define: hon = (write>60);
  Analysis: estimator = ml;
  Model:
    hon on female math read;
Estimator                                                       ML
(output omitted...)
Link                                                         LOGIT
Cholesky                                                       OFF
Input data file(s)
  hsb2.dat
Input data format  FREE
SUMMARY OF CATEGORICAL DATA PROPORTIONS
    HON
      Category 1    0.755
      Category 2    0.245
THE MODEL ESTIMATION TERMINATED NORMALLY
TESTS OF MODEL FIT
Loglikelihood
          H0 Value                         -78.085
Information Criteria
          Number of Free Parameters              4
          Akaike (AIC)                     164.170
          Bayesian (BIC)                   177.363
          Sample-Size Adjusted BIC         164.690
            (n* = (n + 2) / 24)
MODEL RESULTS
                   Estimates     S.E.  Est./S.E.
 HON        ON
    FEMALE             0.980    0.422      2.324
    MATH               0.123    0.031      3.931
    READ               0.059    0.027      2.224
 Thresholds
    HON$1             11.770    1.711      6.880
LOGISTIC REGRESSION ODDS RATIO RESULTS
 HON        ON
    FEMALE             2.664
    MATH               1.131
    READ               1.061

Advanced examples

Example 1. Exploratory factor analysis

Exploratory factor analysis has often been used to explore the variable structures. But most statistical software lacks the sophisticated techniques to deal with the missing value issue or binary variables. On the other hand, Mplus allows us to take care of both issues. Let's start with a simple exploratory factor analysis. This example is taken from our Annotated SPSS Output Factor Analysis page. The data set has many variables, and we are only going to use item13 - item24, as they are all about instructors.

  Data:
    File is factor.dat ;
  Variable:
    Names are
       facsex facethn facnat facrank employm salary yrsteach yrsut degree
       sample remind nstud studrank studsex grade gpa satisfy religion psd
       item13 item14 item15 item16 item17 item18 item19 item20 item21 item22
       item23 item24 item25 item26 item27 item28 item29 item30 item31 item32
       item33 item34 item35 item36 item37 item38 item39 item40 item41 item42
       item43 item44 item45 item46 item47 item48 item49 item50 item51 item52
       race sexism racism rpolicy casteman competen sensitiv cstatus;
    Missing are all (-9999) ;
    Usevariables are item13 - item24;
  Analysis:
    estimator = ml;
    Type = efa 1 3 ;
INPUT READING TERMINATED NORMALLY
SUMMARY OF ANALYSIS
Number of groups                                                 1
Number of observations                                        1428
Number of dependent variables                                   12
Number of independent variables                                  0
Number of continuous latent variables                            0
Observed dependent variables
  Continuous
   ITEM13      ITEM14      ITEM15      ITEM16      ITEM17      ITEM18
   ITEM19      ITEM20      ITEM21      ITEM22      ITEM23      ITEM24
Estimator                                                       ML
Information matrix                                        EXPECTED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Input data file(s)
  factor.dat
Input data format  FREE
RESULTS FOR EXPLORATORY FACTOR ANALYSIS
           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                  1             2             3             4             5
              ________      ________      ________      ________      ________
      1         6.073         1.223         0.735         0.648         0.572
           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                  6             7             8             9            10
              ________      ________      ________      ________      ________
      1         0.539         0.485         0.429         0.383         0.334
           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                 11            12
              ________      ________
      1         0.311         0.267
(output omitted...)
           EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :
           CHI-SQUARE VALUE             147.541
           DEGREES OF FREEDOM                33
           PROBABILITY VALUE             0.0000
           RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) :
           ESTIMATE (90 PERCENT C.I.) IS  0.049 ( 0.041  0.058)
           PROBABILITY RMSEA LE 0.05 IS    0.540
           ROOT MEAN SQUARE RESIDUAL IS        0.0175
           VARIMAX ROTATED LOADINGS
                  1             2             3
              ________      ________      ________
 ITEM13         0.744         0.158         0.236
 ITEM14         0.753         0.197         0.213
 ITEM15         0.650         0.303         0.258
 ITEM16         0.581         0.292         0.177
 ITEM17         0.532         0.468         0.300
 ITEM18         0.277         0.731         0.240
 ITEM19         0.158         0.745         0.130
 ITEM20         0.243         0.470         0.187
 ITEM21         0.350         0.504         0.383
 ITEM22         0.189         0.531         0.319
 ITEM23         0.409         0.365         0.724
 ITEM24         0.321         0.309         0.604
           PROMAX ROTATED LOADINGS
                  1             2             3
              ________      ________      ________
 ITEM13         0.820        -0.098         0.050
 ITEM14         0.828        -0.037         0.001
 ITEM15         0.645         0.110         0.063
 ITEM16         0.591         0.152        -0.029
 ITEM17         0.424         0.342         0.105
 ITEM18         0.035         0.790         0.009
 ITEM19        -0.079         0.890        -0.116
 ITEM20         0.093         0.475         0.032
 ITEM21         0.144         0.402         0.268
 ITEM22        -0.048         0.510         0.218
 ITEM23         0.128         0.044         0.786
 ITEM24         0.079         0.048         0.662
           PROMAX FACTOR CORRELATIONS
                  1             2             3
              ________      ________      ________
      1         1.000
      2         0.611         1.000
      3         0.658         0.685         1.000
           ESTIMATED RESIDUAL VARIANCES
              ITEM13        ITEM14        ITEM15        ITEM16        ITEM17
              ________      ________      ________      ________      ________
      1         0.367         0.349         0.418         0.545         0.408
           ESTIMATED RESIDUAL VARIANCES
              ITEM18        ITEM19        ITEM20        ITEM21        ITEM22
              ________      ________      ________      ________      ________
      1         0.331         0.404         0.685         0.477         0.581
           ESTIMATED RESIDUAL VARIANCES
              ITEM23        ITEM24
              ________      ________
      1         0.176         0.436

Example 2. Exploratory factor analysis with binary variables

For the purpose of illustration, we dichotomized the variables item13-item24 from the previous example. We will do the same exploratory factor analysis again, but with the binary variables. Factor analysis with binary variables uses the tetrachoric correlation structure. It requires much larger sample size than the case for continuous variables.

  Data:
    File is cat_factor.dat ;
  Variable:
    Names are
       item13 item14 item15 item16 item17 item18 item19 item20 item21 item22
       item23 item24 cat_13 - cat_24;
    Missing are all (-9999) ;
    usevariables are cat_13 - cat_24;
    categorical are  cat_13 - cat_24;
  Analysis:
    Type = efa 1 3 ;


SUMMARY OF ANALYSIS
Number of groups                                                 1
Number of observations                                        1428
Number of dependent variables                                   12
Number of independent variables                                  0
Number of continuous latent variables                            0
Observed dependent variables
  Binary and ordered categorical (ordinal)
   CAT_13      CAT_14      CAT_15      CAT_16      CAT_17      CAT_18
   CAT_19      CAT_20      CAT_21      CAT_22      CAT_23      CAT_24
Estimator                                                      ULS
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
(output omitted...)

RESULTS FOR EXPLORATORY FACTOR ANALYSIS
           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                  1             2             3             4             5
              ________      ________      ________      ________      ________
      1         7.208         1.280         0.768         0.622         0.451
           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                  6             7             8             9               
              ________      ________      ________      ________      ________
      1         0.424         0.374         0.259         0.180         0.174
           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                 11            12
              ________      ________
      1         0.157         0.104
(output omitted...)
           EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :
           ROOT MEAN SQUARE RESIDUAL IS        0.0199
           VARIMAX ROTATED LOADINGS
                  1             2             3
              ________      ________      ________
 CAT_13         0.813         0.260         0.297
 CAT_14         0.806         0.228         0.306
 CAT_15         0.824         0.293         0.262
 CAT_16         0.758         0.307         0.141
 CAT_17         0.724         0.502         0.226
 CAT_18         0.254         0.794         0.241
 CAT_19         0.363         0.728         0.154
 CAT_20         0.223         0.484         0.153
 CAT_21         0.271         0.574         0.414
 CAT_22         0.177         0.592         0.320
 CAT_23         0.412         0.413         0.812
 CAT_24         0.337         0.368         0.613
           PROMAX ROTATED LOADINGS
                  1             2             3
              ________      ________      ________
 CAT_13         0.832        -0.005         0.120
 CAT_14         0.832        -0.049         0.144
 CAT_15         0.844         0.050         0.062
 CAT_16         0.791         0.136        -0.087
 CAT_17         0.657         0.369        -0.028
 CAT_18        -0.032         0.882         0.008
 CAT_19         0.151         0.799        -0.112
 CAT_20         0.064         0.516        -0.003
 CAT_21         0.017         0.517         0.300
 CAT_22        -0.079         0.604         0.193
 CAT_23         0.137         0.102         0.842
 CAT_24         0.115         0.144         0.612
           PROMAX FACTOR CORRELATIONS
                  1             2             3
              ________      ________      ________
      1         1.000
      2         0.606         1.000
      3         0.574         0.645         1.000
           ESTIMATED RESIDUAL VARIANCES
              CAT_13        CAT_14        CAT_15        CAT_16        CAT_17
              ________      ________      ________      ________      ________
      1         0.183         0.205         0.166         0.312         0.172
           ESTIMATED RESIDUAL VARIANCES
              CAT_18        CAT_19        CAT_20        CAT_21        CAT_22
              ________      ________      ________      ________      ________
      1         0.247         0.315         0.693         0.426         0.516
           ESTIMATED RESIDUAL VARIANCES
              CAT_23        CAT_24
              ________      ________
      1         0.001         0.376

Example 3. Exploratory factor analysis on continuous outcome variables with missing data

For the purpose of illustration again, we have created another version of the data set. This data set is basely on the data set in Example 1 in the section of Advanced Examples. We have created a lot of missing values, and the pattern of missing is completely random. For the same analysis, we will add the type = missing option to tell Mplus that the analysis will be done without deleting any cases. In general,  Mplus offers ML estimation under the assumption of MCAR and MAR. From the output labeled as "PROPORTION OF DATA PRESENT", we can see that many variables have a good amount of missing data.

  Data:
    File is factor_missing.dat ;
  Variable:
    Names are
       item13 item14 item15 item16 item17 item18 item19 item20 item21 item22
       item23 item24;
    Missing are all (-9999) ;
  Analysis:
    Type = efa 1 3 missing;
INPUT READING TERMINATED NORMALLY
SUMMARY OF ANALYSIS
Number of groups                                                 1
Number of observations                                        1428
Number of dependent variables                                   12
Number of independent variables                                  0
Number of continuous latent variables                            0
Observed dependent variables
  Continuous
   ITEM13      ITEM14      ITEM15      ITEM16      ITEM17      ITEM18
   ITEM19      ITEM20      ITEM21      ITEM22      ITEM23      ITEM24
Estimator                                                       ML
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Input data file(s)
  factor_missing.dat
Input data format  FREE
SUMMARY OF DATA
     Number of patterns         940
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value   0.100
     PROPORTION OF DATA PRESENT
           Covariance Coverage
              ITEM13        ITEM14        ITEM15        ITEM16        ITEM17
              ________      ________      ________      ________      ________
 ITEM13         0.492
 ITEM14         0.209         0.436
 ITEM15         0.216         0.183         0.433
 ITEM16         0.266         0.235         0.225         0.513
 ITEM17         0.277         0.235         0.227         0.280         0.544
 ITEM18         0.257         0.228         0.237         0.264         0.282
 ITEM19         0.245         0.218         0.218         0.263         0.275
 ITEM20         0.271         0.232         0.214         0.288         0.293
 ITEM21         0.305         0.277         0.272         0.319         0.343
 ITEM22         0.349         0.298         0.305         0.370         0.379
 ITEM23         0.422         0.377         0.371         0.443         0.477
 ITEM24         0.410         0.368         0.368         0.436         0.466
           Covariance Coverage
              ITEM18        ITEM19        ITEM20        ITEM21        ITEM22
              ________      ________      ________      ________      ________
 ITEM18         0.520
 ITEM19         0.258         0.508
 ITEM20         0.272         0.272         0.533
 ITEM21         0.327         0.318         0.333         0.625
 ITEM22         0.370         0.361         0.382         0.438         0.704
 ITEM23         0.449         0.440         0.453         0.543         0.606
 ITEM24         0.431         0.428         0.451         0.539         0.590
           Covariance Coverage
              ITEM23        ITEM24
              ________      ________
 ITEM23         0.867
 ITEM24         0.732         0.848

RESULTS FOR EXPLORATORY FACTOR ANALYSIS
           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                  1             2             3             4             5
              ________      ________      ________      ________      ________
      1         6.043         1.257         0.736         0.658         0.627
           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                  6             7             8             9            10
              ________      ________      ________      ________      ________
      1         0.551         0.454         0.439         0.422         0.331
           EIGENVALUES FOR SAMPLE CORRELATION MATRIX
                 11            12
              ________      ________
      1         0.267         0.213
(output omitted...)
           EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :
           CHI-SQUARE VALUE              90.822
           DEGREES OF FREEDOM                33
           PROBABILITY VALUE             0.0000
           RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) :
           ESTIMATE (90 PERCENT C.I.) IS  0.035 ( 0.027  0.044)
           PROBABILITY RMSEA LE 0.05 IS    0.998
           ROOT MEAN SQUARE RESIDUAL IS        0.0286
           VARIMAX ROTATED LOADINGS
                  1             2             3
              ________      ________      ________
 ITEM13         0.789         0.168         0.151
 ITEM14         0.742         0.216         0.176
 ITEM15         0.598         0.347         0.312
 ITEM16         0.549         0.176         0.345
 ITEM17         0.535         0.264         0.483
 ITEM18         0.233         0.231         0.750
 ITEM19         0.142         0.183         0.700
 ITEM20         0.278         0.151         0.510
 ITEM21         0.337         0.362         0.546
 ITEM22         0.169         0.310         0.520
 ITEM23         0.358         0.768         0.392
 ITEM24         0.315         0.554         0.348
           PROMAX ROTATED LOADINGS
                  1             2             3
              ________      ________      ________
 ITEM13         0.897        -0.041        -0.091
 ITEM14         0.812         0.031        -0.066
 ITEM15         0.540         0.203         0.101
 ITEM16         0.526        -0.034         0.235
 ITEM17         0.433         0.041         0.388
 ITEM18        -0.025        -0.021         0.848
 ITEM19        -0.109        -0.043         0.827
 ITEM20         0.137        -0.055         0.546
 ITEM21         0.127         0.210         0.484
 ITEM22        -0.061         0.194         0.519
 ITEM23         0.062         0.829         0.089
 ITEM24         0.096         0.558         0.137
           PROMAX FACTOR CORRELATIONS
                  1             2             3
              ________      ________      ________
      1         1.000
      2         0.628         1.000
      3         0.614         0.686         1.000
           ESTIMATED RESIDUAL VARIANCES
              ITEM13        ITEM14        ITEM15        ITEM16        ITEM17
              ________      ________      ________      ________      ________
      1         0.327         0.373         0.424         0.549         0.411
           ESTIMATED RESIDUAL VARIANCES
              ITEM18        ITEM19        ITEM20        ITEM21        ITEM22
              ________      ________      ________      ________      ________
      1         0.329         0.456         0.639         0.457         0.605
           ESTIMATED RESIDUAL VARIANCES
              ITEM23        ITEM24
              ________      ________
      1         0.128         0.473

Example 4. Path analysis with indirect and direct effects

We have created a fake data set on school performance. We hypothesize that school performance will be related to student's IQ, ambition and social economic status. On the other hand, student's IQ might be also related to ses. Here is the diagram for our hypothesis:

Mplus offers a very straightforward way to display all the possible direct and indirect effects by using the model indirect statement.

Data:
  File is path_anlaysis.dat ;
Variable:
  Names are pfrm ses ambition iq;
  Missing are all (-9999) ;
Model:
   pfrm on iq ambition ses;
   iq on  ses;
Model indirect:
   pfrm ind ses;

TESTS OF MODEL FIT

Chi-Square Test of Model Fit

          Value                              0.060
          Degrees of Freedom                     1
          P-Value                           0.8066

Chi-Square Test of Model Fit for the Baseline Model

          Value                            135.440
          Degrees of Freedom                     5
          P-Value                           0.0000

CFI/TLI

          CFI                                1.000
          TLI                                1.036

Loglikelihood

          H0 Value                       -1775.747
          H1 Value                       -1775.717

Information Criteria

          Number of Free Parameters              6
          Akaike (AIC)                    3563.494
          Bayesian (BIC)                  3583.283
          Sample-Size Adjusted BIC        3564.275
            (n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.000
          90 Percent C.I.                    0.000  0.117
          Probability RMSEA <= .05           0.849

SRMR (Standardized Root Mean Square Residual)

          Value                              0.006

MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

 PFRM     ON
    IQ                 0.547    0.051     10.728
    AMBITION           5.635    1.009      5.584
    SES                0.930    0.727      1.279

 IQ       ON
    SES                4.152    0.957      4.339

 Residual Variances
    PFRM              49.706    4.971     10.000
    IQ                95.599    9.560     10.000


TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS


                   Estimates     S.E.  Est./S.E.

Effects from SES to PFRM

  Total                3.201    0.870      3.677
  Total indirect       2.271    0.565      4.022

  Specific indirect

    PFRM
    IQ
    SES                2.271    0.565      4.022

  Direct
    PFRM
    SES                0.930    0.727      1.279

Example 5. Growth curve modeling with the long format approach

We have chosen a simple example to show how Mplus can handle growth curve modeling. Unlike most statistical software, Mplus does growth curve modeling in both long and wide format. The two approaches offer different ways of looking at the same model and offer alternative models to one another. The example here is taken from Chapter 7 of Singer and Willett's Applied Longitudinal Data Analysis. The outcome variable is the response time on a timed cognitive task called "opposites naming". It is measured at four time points. We will start with the long format approach. This means that each subject will have potentially four rows of observations on the dependent variable and other covariates. In other words, this is the univariate approach. This is also the standard hierarchical linear model approach.

  Data:
    File is opposites_pp.dat;
  Variable:
    Names are
       id time opp cog ccog wave;
    Missing are all (-9999) ;
    Usevariables are
       time opp ccog;
    Cluster = id;
    Within are time ;
    Between are ccog;
  Analysis: type = random twolevel;
  Model:
      %within%
      s | opp on time;
      %between%
      opp s on ccog;
      opp with s;

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         140

Number of dependent variables                                    1
Number of independent variables                                  2
Number of continuous latent variables                            1

Observed dependent variables

  Continuous
   OPP

Observed independent variables
   TIME        CCOG

Continuous latent variables
   S

Variables with special functions

  Cluster variable      ID
  Within variables
   TIME

  Between variables
   CCOG

Estimator                                                      MLR
Information matrix                                        OBSERVED
Maximum number of iterations                                  1000
Convergence criterion                                    0.100D-05
Maximum number of EM iterations                                500
Convergence criteria for the EM algorithm
  Loglikelihood change                                   0.100D-02
  Relative loglikelihood change                          0.100D-05
  Derivative                                             0.100D-02
Minimum variance                                         0.100D-03
Maximum number of steepest descent iterations                   20
Maximum number of iterations for H1                           2000
Convergence criterion for H1                             0.100D-03
Optimization algorithm                                         EMA

Input data file(s)
  opposites_pp.dat
Input data format  FREE

SUMMARY OF DATA

     Number of clusters          35

       Size (s)    Cluster ID with Size s

          4             1      2      3      4      5      6      7      8
                        9     10     11     12     13     14     15     16
                       17     18     19     20     21     22     23     24
                       25     26     27     28     29     30     31     32
                       33     34     35

     Average cluster size   4.000

     Estimated Intraclass Correlations for the Y Variables

                Intraclass              Intraclass
     Variable  Correlation   Variable  Correlation

     OPP          0.406


THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Loglikelihood

          H0 Value                        -633.451
          H0 Scaling Correction Factor       0.793
            for MLR

Information Criteria

          Number of Free Parameters              8
          Akaike (AIC)                    1282.901
          Bayesian (BIC)                  1306.434
          Sample-Size Adjusted BIC        1281.123
            (n* = (n + 2) / 24)

MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

Within Level

 Residual Variances
    OPP              159.727   23.491      6.800

Between Level

 S          ON
    CCOG               0.433    0.121      3.566

 OPP        ON
    CCOG              -0.114    0.416     -0.274

 OPP      WITH
    S               -165.185   67.783     -2.437

 Intercepts
    OPP              164.384    6.024     27.286
    S                 26.954    1.936     13.923

 Residual Variances
    OPP             1158.985  278.161      4.167
    S                 99.238   23.369      4.247

Example 6a. Growth curve modeling with the wide format approach

Now let's move to growth curve modeling with a wide format approach. The data structure is now in wide format. That is each subject will only have one row of data, with four dependent variables corresponding to the four time points. In other words, this is the multivariate approach. To this end, we have to restructure the data from long to wide (in another statistical package). In order to match the results from the long format approach, we have to constrain the residual variance at each time point to be equal to each other. This also gives us a hint that the residual variances don't have to be always equal, leading to more flexible models.

    Data:
      File is opposites_wide.dat ;
    Variable:
      Names are
         id opp1 opp2 opp3 opp4 cog ccog;
      Missing are all (-9999) ;
      usev = opp1-opp4 ccog;
    Analysis:
      Type = meanstructure;
    Model:
       i s  | opp1@0 opp2@1 opp3@2 opp4@3;
       i s on ccog;
       [i s];
       [opp1-opp4@0];   ! constraining the mean to be zero at all time points.
       opp1 - opp4 (1); ! constraining the residual variance to be equal
                        ! at all time points.

INPUT READING TERMINATED NORMALLY
SUMMARY OF ANALYSIS
Number of groups                                                 1
Number of observations                                          35
Number of dependent variables                                    4
Number of independent variables                                  1
Number of continuous latent variables                            2
Observed dependent variables
  Continuous
   OPP1        OPP2        OPP3        OPP4
Observed independent variables
   CCOG
Continuous latent variables
   I           S
Estimator                                                       ML
Information matrix                                        EXPECTED
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Input data file(s)
  opposites_wide.dat
Input data format  FREE
THE MODEL ESTIMATION TERMINATED NORMALLY
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
          Value                              6.899
          Degrees of Freedom                    10
          P-Value                           0.7350
Chi-Square Test of Model Fit for the Baseline Model
          Value                            134.996
          Degrees of Freedom                    10
          P-Value                           0.0000
CFI/TLI
          CFI                                1.000
          TLI                                1.025
Loglikelihood
          H0 Value                        -770.987
          H1 Value                        -767.538
Information Criteria
          Number of Free Parameters              8
          Akaike (AIC)                    1557.975
          Bayesian (BIC)                  1570.418
          Sample-Size Adjusted BIC        1545.438
            (n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
          Estimate                           0.000
          90 Percent C.I.                    0.000  0.134
          Probability RMSEA <= .05           0.787
SRMR (Standardized Root Mean Square Residual)
          Value                              0.043
MODEL RESULTS
                   Estimates     S.E.  Est./S.E.
 I        |
    OPP1               1.000    0.000      0.000
    OPP2               1.000    0.000      0.000
    OPP3               1.000    0.000      0.000
    OPP4               1.000    0.000      0.000
 S        |
    OPP1               0.000    0.000      0.000
    OPP2               1.000    0.000      0.000
    OPP3               2.000    0.000      0.000
    OPP4               3.000    0.000      0.000
 I        ON
    CCOG              -0.114    0.489     -0.232
 S        ON
    CCOG               0.433    0.157      2.753
 S        WITH
    I               -165.303   78.279     -2.112
 Intercepts
    OPP1               0.000    0.000      0.000
    OPP2               0.000    0.000      0.000
    OPP3               0.000    0.000      0.000
    OPP4               0.000    0.000      0.000
    I                164.374    6.026     27.277
    S                 26.960    1.936     13.925
 Residual Variances
    OPP1             159.475   26.956      5.916
    OPP2             159.475   26.956      5.916
    OPP3             159.475   26.956      5.916
    OPP4             159.475   26.956      5.916
    I               1159.354  304.409      3.809
    S                 99.298   31.821      3.121

Example 6b. Growth curve modeling with the wide format approach (different parameterization)

As we have assumed in the previous models, the random intercept and the random slope are always correlated with each other. With the wide format approach, we can also model the correlation in the way of regression. This basically reparameterizes the model. But now we can describe the relationship between the intercept and the slope in terms of changes. 

   Data:
      File is opposites_wide.dat ;
    Variable:
      Names are
         id opp1 opp2 opp3 opp4 cog ccog;
      Missing are all (-9999) ;
      usev = opp1-opp4 ccog;
    Analysis:
      Type = meanstructure;
    Model:
       i s  | opp1@0 opp2@1 opp3@2 opp4@3;
       i s on ccog;
       [i s];
       s on i;          ! different parameterization happens here
       [opp1-opp4@0];   ! constraining the mean to be zero at all time points.
       opp1 - opp4 (1); ! constraining the residual variance to be equal
                        ! at all time points.

TESTS OF MODEL FIT

Chi-Square Test of Model Fit

          Value                              6.899
          Degrees of Freedom                    10
          P-Value                           0.7350

Chi-Square Test of Model Fit for the Baseline Model

          Value                            134.996
          Degrees of Freedom                    10
          P-Value                           0.0000

CFI/TLI

          CFI                                1.000
          TLI                                1.025

Loglikelihood

          H0 Value                        -770.987
          H1 Value                        -767.538

Information Criteria

          Number of Free Parameters              8
          Akaike (AIC)                    1557.975
          Bayesian (BIC)                  1570.418
          Sample-Size Adjusted BIC        1545.438
            (n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

          Estimate                           0.000
          90 Percent C.I.                    0.000  0.134
          Probability RMSEA <= .05           0.787

SRMR (Standardized Root Mean Square Residual)

          Value                              0.043



MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

 I        |
    OPP1               1.000    0.000      0.000
    OPP2               1.000    0.000      0.000
    OPP3               1.000    0.000      0.000
    OPP4               1.000    0.000      0.000

 S        |
    OPP1               0.000    0.000      0.000
    OPP2               1.000    0.000      0.000
    OPP3               2.000    0.000      0.000
    OPP4               3.000    0.000      0.000

 S        ON
    I                 -0.143    0.051     -2.773

 I        ON
    CCOG              -0.114    0.489     -0.232

 S        ON
    CCOG               0.417    0.135      3.091

 Intercepts
    OPP1               0.000    0.000      0.000
    OPP2               0.000    0.000      0.000
    OPP3               0.000    0.000      0.000
    OPP4               0.000    0.000      0.000
    I                164.374    6.026     27.277
    S                 50.398    8.613      5.852

 Residual Variances
    OPP1             159.477   26.957      5.916
    OPP2             159.477   26.957      5.916
    OPP3             159.477   26.957      5.916
    OPP4             159.477   26.957      5.916
    I               1159.380  304.416      3.809
    S                 75.726   23.268      3.255

Example 7a. Latent class analysis

This example uses the hsb2 data set. We have test scores for the students in the sample and demographic variables as well. We want to see if we can classify students based on their test scores and how the class membership relates to other variables. This example is strictly for the purpose of illustration and therefore does not reflect any real theory or such. Notice that we have taken the default syntax to perform this analysis. We are looking for a two latent classes solution based on the scores on read, write, math, science and social studies (socst). The class membership is then regressed on the variables female and ses. Our model runs "successfully". But Mplus gives us warning messages. It tells that the assumption that Mplus makes by default is that all the variables are uncorrelated within each latent class. Can we accept this assumption? Maybe not. But for the time being, let's take a look at the rest of the output. We have the average scores for each of the two latent classes. We can tell that the first class has lower means on all the variables and the second one has higher means. These two classes make sense to us. Also, the class membership is highly related to ses.

  Data:
    File is hsb2.dat ;
  Variable:
    Names are
       id female race ses schtyp prog read write math science socst;
    Usevariables are
       read write math science socst female ses;
    classes = grp(2);
  Analysis:
    type=mixture;
  Model:
    %overall%
      grp#1 on female ses;
 
*** WARNING in Model command
  Variable is uncorrelated with all other variables within class:  READ
*** WARNING in Model command
  Variable is uncorrelated with all other variables within class:  WRITE
*** WARNING in Model command
  Variable is uncorrelated with all other variables within class:  MATH
*** WARNING in Model command
  Variable is uncorrelated with all other variables within class:  SCIENCE
*** WARNING in Model command
  Variable is uncorrelated with all other variables within class:  SOCST
*** WARNING in Model command
  All least one variable is uncorrelated with all other variables within class.
  Check that this is what is intended.
   6 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS


Latent Class Analysis with Graphs

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         200

Number of dependent variables                                    5
Number of independent variables                                  2
Number of continuous latent variables                            0
Number of categorical latent variables                           1

Observed dependent variables

  Continuous
   READ        WRITE       MATH        SCIENCE     SOCST

Observed independent variables
   FEMALE      SES

Categorical latent variables
   GRP


Estimator                                                      MLR
(output omitted...)

TESTS OF MODEL FIT

Loglikelihood

          H0 Value                       -3510.499
          H0 Scaling Correction Factor       1.126
            for MLR

Information Criteria

          Number of Free Parameters             18
          Akaike (AIC)                    7056.999
          Bayesian (BIC)                  7116.369
          Sample-Size Adjusted BIC        7059.343
            (n* = (n + 2) / 24)
          Entropy                            0.852



FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL

    Latent
   Classes

       1         96.61160          0.48306
       2        103.38840          0.51694


FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS
BASED ON ESTIMATED POSTERIOR PROBABILITIES

    Latent
   Classes

       1         96.61161          0.48306
       2        103.38839          0.51694


CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Class Counts and Proportions

    Latent
   Classes

       1               95          0.47500
       2              105          0.52500


Average Latent Class Probabilities for Most Likely Latent Class Membership (Row)
by Latent Class (Column)

          1        2

   1   0.963    0.037
   2   0.049    0.951


MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

Latent Class 1

 Means
    READ              44.645    1.107     40.336
    WRITE             45.822    1.197     38.269
    MATH              45.766    0.806     56.784
    SCIENCE           45.189    1.405     32.153
    SOCST             45.785    1.375     33.288

 Variances
    READ              50.830    5.261      9.662
    WRITE             44.222    5.109      8.656
    MATH              43.108    4.842      8.903
    SCIENCE           56.073    7.406      7.572
    SOCST             73.733    7.395      9.970

Latent Class 2

 Means
    READ              59.318    1.168     50.791
    WRITE             59.272    0.913     64.939
    MATH              59.073    1.256     47.018
    SCIENCE           58.075    0.836     69.495
    SOCST             58.591    1.041     56.288

 Variances
    READ              50.830    5.261      9.662
    WRITE             44.222    5.109      8.656
    MATH              43.108    4.842      8.903
    SCIENCE           56.073    7.406      7.572
    SOCST             73.733    7.395      9.970

Categorical Latent Variables

 GRP#1      ON
    FEMALE            -0.173    0.344     -0.502
    SES               -0.779    0.222     -3.506

 Intercepts
    GRP#1              1.622    0.556      2.917


LOGISTIC REGRESSION ODDS RATIO RESULTS

Categorical Latent Variables

 GRP#1    ON
    FEMALE             0.841
    SES                0.459


ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION

Parameterization using Reference Class 1

 GRP#2    ON
    FEMALE             0.173    0.344      0.502
    SES                0.779    0.222      3.506

 Intercepts
    GRP#2             -1.622    0.556     -2.917

Example 7b. Latent class analysis with graphics

Now, let's take up the issue of the correlation of variables within latent classes. We will also request some plots. Should we allow all the test scores to be correlated with each other? Maybe not. In this example, we allow reading scores to be correlated with all the other test scores, writing scores to be correlated with social studies scores, and math scores to be correlated with the science scores. We can take a look at the difference in AIC values and conclude that this is a better fitting model than the previous one.

    Data:
      File is hsb2.dat ;
    Variable:
      Names are
         id female race ses schtyp prog read write math science socst;
      Usevariables are
         read write math science socst female ses;
      classes = grp(2);
    Analysis:
      type=mixture;
    Model:
      %overall%
        read with write;
        read with math;
        read with science;
        read with socst;

        write with socst;

        math with science;

        grp#1 on female ses;
    Plot:
       type is plot3;
       series is read (1) write (2) math (3) science (4) socst (5);

(output omitted...)

TESTS OF MODEL FIT

Loglikelihood

          H0 Value                       -3455.156
          H0 Scaling Correction Factor       1.068
            for MLR

Information Criteria

          Number of Free Parameters             24
          Akaike (AIC)                    6958.313
          Bayesian (BIC)                  7037.472
          Sample-Size Adjusted BIC        6961.438
            (n* = (n + 2) / 24)
          Entropy                            0.838

FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL

    Latent
   Classes

       1         77.82126          0.38911
       2        122.17874          0.61089


FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS
BASED ON ESTIMATED POSTERIOR PROBABILITIES

    Latent
   Classes

       1         77.82125          0.38911
       2        122.17875          0.61089


CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP

Class Counts and Proportions

    Latent
   Classes

       1               76          0.38000
       2              124          0.62000


Average Latent Class Probabilities for Most Likely Latent Class Membership (Row)
by Latent Class (Column)

          1        2

   1   0.956    0.044
   2   0.042    0.958


MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

Latent Class 1

 READ     WITH
    WRITE              9.024    3.276      2.755
    MATH              24.570    5.285      4.649
    SCIENCE           27.390    5.820      4.706
    SOCST             25.783    5.457      4.724

 WRITE    WITH
    SOCST             18.927    3.559      5.319

 MATH     WITH
    SCIENCE           27.609    6.718      4.109

 Means
    READ              45.417    0.942     48.209
    WRITE             42.995    1.347     31.917
    MATH              45.527    0.722     63.091
    SCIENCE           45.100    1.172     38.487
    SOCST             45.613    1.261     36.185

 Variances
    READ              66.360    5.860     11.324
    WRITE             28.467    4.359      6.530
    MATH              55.061    6.780      8.121
    SCIENCE           68.513    9.495      7.216
    SOCST             85.301    8.522     10.010

Latent Class 2

 READ     WITH
    WRITE              9.024    3.276      2.755
    MATH              24.570    5.285      4.649
    SCIENCE           27.390    5.820      4.706
    SOCST             25.783    5.457      4.724

 WRITE    WITH
    SOCST             18.927    3.559      5.319

 MATH     WITH
    SCIENCE           27.609    6.718      4.109

 Means
    READ              56.570    1.153     49.054
    WRITE             59.005    0.580    101.768
    MATH              57.179    1.072     53.347
    SCIENCE           56.150    1.018     55.171
    SOCST             56.731    1.024     55.428

 Variances
    READ              66.360    5.860     11.324
    WRITE             28.467    4.359      6.530
    MATH              55.061    6.780      8.121
    SCIENCE           68.513    9.495      7.216
    SOCST             85.301    8.522     10.010

Categorical Latent Variables

 GRP#1      ON
    FEMALE            -1.166    0.419     -2.780
    SES               -1.069    0.278     -3.842

 Intercepts
    GRP#1              2.297    0.665      3.456


LOGISTIC REGRESSION ODDS RATIO RESULTS

Categorical Latent Variables

 GRP#1    ON
    FEMALE             0.312
    SES                0.343


ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION

Parameterization using Reference Class 1

 GRP#2    ON
    FEMALE             1.166    0.419      2.780
    SES                1.069    0.278      3.842

 Intercepts
    GRP#2             -2.297    0.665     -3.456

Reference and online resources:

  1. Mplus User's Guide Version 4.1
  2. Introduction to Using Mplus 3
  3. Mplus for Windows: An Introduction by Information Technology Services at The University of Texas at Austin

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California