UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

Mplus Code Fragment
A Latent Class Example

These code fragments are examples that we are using to try and understand these techniques using Mplus.  We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.

Consider the file Stata file hsb6 that has 600 observations with information about students like their reading, writing, math and other achievement scores.  For the variables locus concept mot read-ss we will make a binary variable called hi___  that is 1 if the score is over the median, and 0 if below the median.  This will be useful when we need a binary variable.  Here we read the data from Stata, make the binary version of the file, compress it, and then conver the file to mplus using stata2mplus .

use http://www.ats.ucla.edu/stat/mplus/code/hsb6, clear
foreach varname of varlist locus concept mot read-ss {
  summarize `varname', detail
  generate hi`varname' = `varname' > `r(p50)'
}
compress
save hsb6, replace
stata2mplus using hsb6

We now have the input file hsb6.inp and the data file it reads called hsb6.dat


Example 1: A latent class analysis with 2 classes, and continuous indicators

Here is the input file

Data:
  File is I:\mplus\hsb6.dat ;
Variable:
  Names are 
   id gender race ses sch prog locus concept mot career read write math
   sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
  Usevariables are 
     read write math sci ss ;
  classes = c(2);
Analysis: 
  Type=mixture;
MODEL:
  %C#1%
  [read math sci ss write  * 30 ];

  %C#2%
  [read math sci ss write  * 60];
OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex1.txt ;
  save is cprob;
  format is free;

Here is the output

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1        274.09000          0.45682
  Class 2        325.91000          0.54318

#1. One way to view the second column is the average probability of falling into class 1 and class 2.  As a result column 1 is the average probability times 600 (see stata example below for comparison).

A second way to view the second column is by taking each persons probability of falling into a class, and summing them.  If person #6 has a .8 estimated probability of being in class 1, and .2 of being in class 2, then that person contributes .8 to class 1 and .2 to class 2.  This is why these are these are fractional (see stata example below for comparison).

A third way of viewing this is that there is an underlying continuum of the latent variable, and there is a threshold for being categorized as class 1 or class 2, and that threshold can be used to compute the probabilities of being in the classes, see section #5

------------------------------------------------------------------------------
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions
  Class 1              272          0.45333
  Class 2              328          0.54667

#2. This shows the count of people who fall into each class by taking their probability of membership in each class and assigning them to the class which they have the highest probability of falling into.  Note the counts are exact whole numbers.

------------------------------------------------------------------------------
Average Class Probabilities by Class
                 1        2
  Class 1     0.957    0.043
  Class 2     0.042    0.958

#3. This is related to the output in #1, but takes the probabilities of class membership and averages them by class, see Stata portion below for more on this.

------------------------------------------------------------------------------
MODEL RESULTS
                   Estimates     S.E.  Est./S.E.
CLASS 1
 Means
    READ              43.783    0.642     68.152
    WRITE             45.068    0.730     61.738
    MATH              44.794    0.469     95.540
    SCI               44.446    0.740     60.051
    SS                45.574    0.658     69.237
 Variances
    READ              46.463    2.785     16.681
    WRITE             49.427    3.011     16.415
    MATH              46.634    3.133     14.884
    SCI               49.022    3.388     14.470
    SS                62.216    4.109     15.141
CLASS 2
 Means
    READ              58.730    0.605     97.000
    WRITE             58.538    0.497    117.764
    MATH              57.782    0.687     84.120
    SCI               57.917    0.499    116.079
    SS                57.488    0.589     97.629
 Variances
    READ              46.463    2.785     16.681
    WRITE             49.427    3.011     16.415
    MATH              46.634    3.133     14.884
    SCI               49.022    3.388     14.470
    SS                62.216    4.109     15.141

#4. This shows the average on the scores for the two classes.  Class 1 is a low performing group, and class 2 is a high performing group.

------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART

 Means
    C#1               -0.173    0.133     -1.298

#5. This is the threshold for dividing the two classes.  If you are below the threshhold, you are class 1, above it and you are class 2.  We see the threshold is -0.173. Say that we then convert this threshold to a probability like this.

Prob(class 1) = 1/(1 + exp(-threshold1)) = 1 / ( 1 + exp( 0.173)) = .4568 (compare to section 1 above).
Prob(class 2) = 1 - 1/(1 + exp(-threshold1)) = 1 - 1 / ( 1 + exp( 0.173)) = .54314 (compare to section 1 above).

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile read write math sci ss cprob1 cprob2 class using lca_ex1.txt 

Below we show the first observations from the middle of this file.  Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, and class is the class membership based on the class with the highest probability.

. list in 200/210

     +-------------------------------------------------------------+
     | read   write   math    sci     ss   cprob1   cprob2   class |
     |-------------------------------------------------------------|
200. | 46.9    52.1   42.5   47.7   60.5     .944     .056       1 |
201. | 46.9    51.5     57   49.8   40.6       .9       .1       1 |
202. | 46.9    52.8   49.3   53.1   35.6     .983     .017       1 |
203. | 46.9    43.7   41.9   41.7   35.6        1        0       1 |
204. | 46.9    61.9     53   52.6   60.5     .016     .984       2 |
     |-------------------------------------------------------------|
205. | 46.9    41.1   45.3   47.1   55.6     .998     .002       1 |
206. | 46.9    38.5   47.1   41.7   25.7        1        0       1 |
207. | 46.9    54.1   46.4   49.8   55.6     .827     .173       1 |
208. | 46.9    51.5   48.5   49.8   50.6     .934     .066       1 |
209. | 46.9    41.1   53.6   41.7   55.6     .995     .005       1 |
     |-------------------------------------------------------------|
210. | 46.9    61.9   46.2   60.7   45.6      .17      .83       2 |
     +-------------------------------------------------------------+

Note that if we tabulate class we see where the values from section #2 of the output came from.

. tab class

      class |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        272       45.33       45.33
          2 |        328       54.67      100.00
------------+-----------------------------------
      Total |        600      100.00

Note that if we take the average of cprob1 and cprob2, we can relate these values to column 2 of section #1 of the output.

. summ cprob1 cprob2

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      cprob1 |       600    .4568233    .4664192          0          1
      cprob2 |       600    .5431767    .4664192          0          1

If we sum the probabilities, we can relate these to column 1 of section #1 of the output.

. tabstat cprob1 cprob2, stat(sum)

   stats |    cprob1    cprob2
---------+--------------------
     sum |   274.094   325.906
------------------------------

If we average the probabilities by class, we can relate these values to section #3 of the output.

. tabstat cprob1 cprob2, by(class)

Summary statistics: mean
  by categories of: class 

   class |    cprob1    cprob2
---------+--------------------
       1 |  .9570699  .0429301
       2 |  .0419848  .9580152
---------+--------------------
   Total |  .4568233  .5431767
------------------------------

Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2.  Note the correspondence between these means and the means from section 4 of the output.

. tabstat read write math sci ss [aw=cprob1], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  43.78268  45.06829  44.79421  44.44601   45.5743
------------------------------------------------------------

. tabstat read write math sci ss [aw=cprob2], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  58.73021  58.53821  57.78224  57.91736  57.48822
------------------------------------------------------------
 

Example 2: A latent class analysis with 3 classes, and continuous indicators

Here is the input file

Data:
  File is I:\mplus\hsb6.dat ;
Variable:
  Names are 
   id gender race ses sch prog locus concept mot career read write math
   sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
  Usevariables are 
     read write math sci ss ;
  classes = c(3);
Analysis: 
  Type=mixture;
MODEL:
  %C#1%
  [read math sci ss write  *30 ];

  %C#2%
  [read math sci ss write  *45];

  %C#3%
  [read math sci ss write  *60];
OUTPUT:
  TECH8;
SAVEDATA:
  file is lca_ex2.txt ;
  save is cprob;
  format is free;

Here is the output

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

  Class 1        194.55375          0.32426
  Class 2        252.39798          0.42066
  Class 3        153.04826          0.25508

#1. One way to view the second column is the average probability of falling into class 1 and class 2.  As a result column 1 is the average probability times 600 (see stata example below for comparison).

A second way to view the second column is by taking each persons probability of falling into a class, and summing them.  If person #6 has a .8 estimated probability of being in class 1, and .2 of being in class 2, then that person contributes .8 to class 1 and .2 to class 2.  This is why these are these are fractional (see stata example below for comparison).

A third way of viewing this is that there is an underlying continuum of the latent variable, and there is a threshold for being categorized as class 1 or class 2.  If you are below the threshhold, you are class 1, above it and you are class 2.  Below we see the threshold is -0.173. Say that we then convert this threshold to a probability, exp( -0.173)/ ( 1 + exp( -0.173)) = .4568 (compare to above).

------------------------------------------------------------------------------
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions
  Class 1              197          0.32833
  Class 2              249          0.41500
  Class 3              154          0.25667

#2. This shows the count of people who fall into each class by taking their probability of membership in each class and assigning them to the class which they have the highest probability of falling into.  Note the counts are exact whole numbers.

------------------------------------------------------------------------------
Average Class Probabilities by Class

                 1        2        3
  Class 1     0.940    0.060    0.000
  Class 2     0.038    0.912    0.050
  Class 3     0.000    0.087    0.913

#3. This is related to the output in section #1, but takes the probabilities of class membership and averages them by class, see Stata portion below for more on this.

------------------------------------------------------------------------------
MODEL RESULTS

                   Estimates     S.E.  Est./S.E.
CLASS 1
 Means
    READ              41.735    0.477     87.540
    WRITE             42.703    0.962     44.390
    MATH              43.178    0.516     83.648
    SCI               42.160    0.663     63.625
    SS                43.848    0.695     63.097
 Variances
    READ              32.997    2.820     11.699
    WRITE             42.369    3.775     11.223
    MATH              34.562    2.422     14.269
    SCI               38.395    2.714     14.146
    SS                53.884    3.850     13.996

CLASS 2
 Means
    READ              52.618    0.925     56.866
    WRITE             54.507    0.727     74.938
    MATH              52.008    0.835     62.319
    SCI               53.172    0.835     63.680
    SS                52.794    0.808     65.324
 Variances
    READ              32.997    2.820     11.699
    WRITE             42.369    3.775     11.223
    MATH              34.562    2.422     14.269
    SCI               38.395    2.714     14.146
    SS                53.884    3.850     13.996

CLASS 3
 Means
    READ              63.644    0.948     67.117
    WRITE             61.193    0.453    135.170
    MATH              62.610    0.865     72.404
    SCI               61.648    0.667     92.451
    SS                61.232    0.758     80.759
 Variances
    READ              32.997    2.820     11.699
    WRITE             42.369    3.775     11.223
    MATH              34.562    2.422     14.269
    SCI               38.395    2.714     14.146
    SS                53.884    3.850     13.996

#4. This shows the average on the scores for the two classes.  Class 1 is a low performing group, and class 2 is a medium performing group, and class 3 is a high performing group.

------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART

 Means
    C#1                0.240    0.218      1.099
    C#2                0.500    0.181      2.766

#5. This is the threshold for dividing the three classes.  Note that this is now like a multinomial logistic regression, where the thresholds divide three multinomial categories, with class 3 being the reference category and C#1 is the threshold for being in class 1 as compared to class 3, and C#2 is the threshold for being in class 2 as compared to class 3.  For the comparison group, class 3, the probability of being in that class is computed as below, letting "t1" be threshold 1 (.24) and "t2" be threshold 2 (.5).

P(class=3) = 1 / (1 + exp(t1) + exp(t2)) = 1 / (1 + exp(.24) + exp(.5)) = .25510397 .

For classes 1 and 2, the formula is a bit different since these are not the comparison class.  For class 1, the formula is

P(class=1) = exp(t1) / (1 + exp(t1) + exp(t2)) = exp(.24) / (1 + exp(.24) + exp(.5)) = .32430071.

For class 2, the formula is

P(class=2) = exp(t2) / (1 + exp(t1) + exp(t2)) = exp(.5) / (1 + exp(.24) + exp(.5)) = .42059533.

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile read write math sci ss cprob1 cprob2 cprob3 class using lca_ex2.txt 

Below we show observations from the middle of this file.  Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, cprob3 is the probability of being in class 3, and class is the class membership based on the class with the highest probability.  Note that we don't see any folks in class 3 here, but there are members of class 3.

. list in 200/210

     +----------------------------------------------------------------------+
     | read   write   math    sci     ss   cprob1   cprob2   cprob3   class |
     |----------------------------------------------------------------------|
200. | 46.9    52.1   42.5   47.7   60.5     .133     .867        0       2 |
201. | 46.9    51.5     57   49.8   40.6     .062     .938        0       2 |
202. | 46.9    52.8   49.3   53.1   35.6     .228     .772        0       2 |
203. | 46.9    43.7   41.9   41.7   35.6     .998     .002        0       1 |
204. | 46.9    61.9     53   52.6   60.5        0     .996     .004       2 |
     |----------------------------------------------------------------------|
205. | 46.9    41.1   45.3   47.1   55.6     .812     .188        0       1 |
206. | 46.9    38.5   47.1   41.7   25.7        1        0        0       1 |
207. | 46.9    54.1   46.4   49.8   55.6     .039     .961        0       2 |
208. | 46.9    51.5   48.5   49.8   50.6       .1       .9        0       2 |
209. | 46.9    41.1   53.6   41.7   55.6     .709     .291        0       1 |
     |----------------------------------------------------------------------|
210. | 46.9    61.9   46.2   60.7   45.6     .001     .999        0       2 |
     +----------------------------------------------------------------------+

Note that if we tabulate class we see where the values from section #2 of the output came from.

. tab class

      class |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        197       32.83       32.83
          2 |        249       41.50       74.33
          3 |        154       25.67      100.00
------------+-----------------------------------
      Total |        600      100.00

Note that if we take the average of cprob1, cprob2, and cprob3 we can relate these values to column 2 of section #1 of the output.

. summ cprob1 cprob2 cprob3

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
      cprob1 |       600    .3242633     .440132          0          1
      cprob2 |       600    .4206317    .4326395          0       .999
      cprob3 |       600    .2550817    .3989861          0          1

If we sum the probabilities, we can relate these to column 1 of section #1 of the output.

. tabstat cprob1 cprob2 cprob3, stat(sum)

   stats |    cprob1    cprob2    cprob3
---------+------------------------------
     sum |   194.558   252.379   153.049
----------------------------------------

If we average the probabilities by class, we can relate these values to section #3 of the output.

. tabstat cprob1 cprob2 cprob3, by(class)

Summary statistics: mean
  by categories of: class 

   class |    cprob1    cprob2    cprob3
---------+------------------------------
       1 |  .9401117  .0598883         0
       2 |  .0375743  .9123735   .049996
       3 |         0   .087013   .912987
---------+------------------------------
   Total |  .3242633  .4206317  .2550817
----------------------------------------

Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2, and likewise for class 3.  Note the correspondence between these means and the means from section 4 of the output.

. tabstat read write math sci ss [aw=cprob1], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  41.73485  42.70297  43.17746  42.16013  43.84801
------------------------------------------------------------

. tabstat read write math sci ss [aw=cprob2], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  52.61804  54.50678  52.00815  53.17197  52.79395
------------------------------------------------------------

. tabstat read write math sci ss [aw=cprob3], stat(mean) 

   stats |      read     write      math       sci        ss
---------+--------------------------------------------------
    mean |  63.64527  61.19303  62.61002   61.6482   61.2325
------------------------------------------------------------

 
 

 

 

 


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.