### Mplus Code Fragment A Latent Class Example, Examples 3 and 4

These code fragments are examples that we are using to try and understand these techniques using Mplus.  We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.

This picks up after Examples 1 and 2, but considers adding categorical variables to the model.  Based on

We now have the input file hsb6.inp and the data file it reads called hsb6.dat

Example 3: A latent class analysis with 2 classes, and continuous indicators, and a two level categorical variable.

Here is the input file

Data:
File is I:\mplus\hsb6.dat ;
Variable:
Names are
id gender race ses sch prog locus concept mot career read write math
Usevariables are
hiread write math sci ss ;
classes = c(2);
Analysis:
Type=mixture;
MODEL:
%C#1%
[hiread$1 *-1 math sci ss write *30 ]; %C#2% [hiread$1 *+1 math sci ss write  *45];
OUTPUT:
TECH8;
SAVEDATA:
file is lca_ex3.txt ;
save is cprob;
format is free;

Here is the output

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

Class 1        263.13557          0.43856
Class 2        336.86443          0.56144

#1. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions

Class 1              258          0.43000
Class 2              342          0.57000

#2. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
Average Class Probabilities by Class
1        2
Class 1     0.962    0.038
Class 2     0.044    0.956

#3. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
MODEL RESULTS
Estimates     S.E.  Est./S.E.
CLASS 1
Means
WRITE             44.944    0.695     64.645
MATH              44.580    0.461     96.655
SCI               44.084    0.729     60.473
SS                45.392    0.629     72.158
Variances
WRITE             51.204    3.158     16.216
MATH              47.212    3.055     15.453
SCI               47.984    3.468     13.835
SS                62.855    4.235     14.843
CLASS 2
Means
WRITE             58.197    0.495    117.599
MATH              57.527    0.624     92.239
SCI               57.762    0.464    124.497
SS                57.243    0.582     98.361
Variances
WRITE             51.204    3.158     16.216
MATH              47.212    3.055     15.453
SCI               47.984    3.468     13.835
SS                62.855    4.235     14.843

#4. This shows the average on the scores for the two classes for the continuous variables.  Class 1 is a low performing group, and class 2 is a high performing group.

------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART
Class 1
Thresholds
HIREAD$1 2.276 0.381 5.970 Class 2 Thresholds HIREAD$1          -1.835    0.215     -8.523

#5. For categorical variables, we do not estimate means but instead we estimate thresholds.  I imagine this like a logistic regression predicting being a "bad reader" (0 on highread) for each class.  So, for class 1, we have an empty model (no predictors), and the threshold (cut point) is 2.276.  We can exponentiate this to convert it into an odds, exp(2.276) = 9.7376518, so if you are in class 1, the odds (not odds ratio) is almost 10 to 1 that you will be a "bad reader".  We can convert this into a probability like this,  exp(2.276) / (1 + exp(2.276))
.90686977, so if you are in class 1, there is a .9 probability you will be a bad reader.

For class 2, we do the same tricks.  The odds of being a bad reader in class 2 is exp(-1.835) = .1596135 and the probability of being a bad reader in class 2 is exp(-1.835) / (1 + exp(-1.835)) = .13764371.  Note that Mplus shows this to us in section 7 of the output below.

------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART
Means
C#1               -0.247    0.126     -1.961

#6. This is the threshold for dividing the two classes.  If you are below the threshold, you are class 1, above it and you are class 2.  We see the threshold is -0.247. Say that we then convert this threshold to a probability like this.

Prob(class 1) = 1/(1 + exp(-threshold1)) = 1 / ( 1 + exp( 0.247)) = .43856204 (compare to section 1 above).
Prob(class 2) = 1 - 1/(1 + exp(-threshold1)) = 1 - 1 / ( 1 + exp( 0.247)) = .56143796 (compare to section 1 above).

------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE
Class 1
Category 1         0.907    0.032     28.161
Category 2         0.093    0.032      2.893
Class 2
Category 1         0.138    0.026      5.387
Category 2         0.862    0.026     33.744

# 7. These take the thresholds from section 5 of the output and convert them into probabilities for your convenience.  Section 5 shows how you could manually convert the thresholds from that section into the probabilities shown here.

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile hiread write math sci ss cprob1 cprob2 class using lca_ex3.txt

Below we show some observations from the middle of this file.  Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, and class is the class membership based on the class with the highest probability.

. list in 200/210

+---------------------------------------------------------------+
| hiread   write   math    sci     ss   cprob1   cprob2   class |
|---------------------------------------------------------------|
200. |      0    52.1   42.5   47.7   60.5     .954     .046       1 |
201. |      0    51.5     57   49.8   40.6     .914     .086       1 |
202. |      0    52.8   49.3   53.1   35.6     .984     .016       1 |
203. |      0    43.7   41.9   41.7   35.6        1        0       1 |
204. |      0    61.9     53   52.6   60.5     .022     .978       2 |
|---------------------------------------------------------------|
205. |      0    41.1   45.3   47.1   55.6     .998     .002       1 |
206. |      0    38.5   47.1   41.7   25.7        1        0       1 |
207. |      0    54.1   46.4   49.8   55.6     .855     .145       1 |
208. |      0    51.5   48.5   49.8   50.6     .943     .057       1 |
209. |      0    41.1   53.6   41.7   55.6     .996     .004       1 |
|---------------------------------------------------------------|
210. |      0    61.9   46.2   60.7   45.6     .196     .804       2 |
+---------------------------------------------------------------+


Say that we get the mean of the writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2.  Note the correspondence between these means and the means from section 4 of the output.

. tabstat write math sci ss [aw=cprob1], stat(mean)

stats |     write      math       sci        ss
---------+----------------------------------------
mean |  44.94414  44.57938  44.08309  45.39155
--------------------------------------------------

. tabstat write math sci ss [aw=cprob2], stat(mean)

stats |     write      math       sci        ss
---------+----------------------------------------
mean |  58.19674  57.52728  57.76235  57.24318
--------------------------------------------------

Another way to view this is to do a regression predicting, say, write and estimating the intercept and weighting the cases as we have done above, for example.

. regress write [aw=cprob1]
------------------------------------------------------------------------------
write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons |   44.94414   .3987896   112.70   0.000     44.16027    45.72802
------------------------------------------------------------------------------

. regress write [aw=cprob2]
------------------------------------------------------------------------------
write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons |   58.19674   .2886021   201.65   0.000     57.62963    58.76385
------------------------------------------------------------------------------

We can do the same kind of analysis predicting loread , weighting the cases by the probability of being in class 1 and the probability of being in class 2, as shown below.  You can relate the coefficients here to the coefficients in section 5 of the Mplus output.

. logit loread [aw=cprob1]
------------------------------------------------------------------------------
loread |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons |   2.276237   .1679188    13.56   0.000     1.947122    2.605352
------------------------------------------------------------------------------

------------------------------------------------------------------------------
loread |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons |  -1.834929   .1337386   -13.72   0.000    -2.097052   -1.572806
------------------------------------------------------------------------------

Example 4: A latent class analysis with 2 classes, and continuous indicators, and one 3 level indicator.

Here is the input file

Data:
File is g:\mplus\hsb6.dat ;
Variable:
Names are
id gender race ses sch prog locus concept mot career read write math
Usevariables are
read write math sci ss ses;
categorical = ses;
classes = c(2);
Analysis:
Type=mixture;
MODEL:
%C#1%
[read math sci ss write  *30 ses$1 *-1 ses$2 *1];

%C#2%
[read math sci ss write  *45 ses$1 *-1 ses$2 *1];
OUTPUT:
TECH8;
SAVEDATA:
file is lca_ex4.txt ;
save is cprob;
format is free;

Here is the output

------------------------------------------------------------------------------
FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE
BASED ON ESTIMATED POSTERIOR PROBABILITIES

Class 1        274.88163          0.45814
Class 2        325.11837          0.54186

#1. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP

Class Counts and Proportions
Class 1              271          0.45167
Class 2              329          0.54833

#2. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
Average Class Probabilities by Class
1        2
Class 1     0.958    0.042
Class 2     0.046    0.954

#3. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
MODEL RESULTS

Estimates     S.E.  Est./S.E.
CLASS 1
Means
WRITE             45.065    0.766     58.813
MATH              44.800    0.494     90.743
SCI               44.477    0.791     56.221
SS                45.669    0.721     63.303
Variances
WRITE             49.141    3.039     16.171
MATH              46.484    3.244     14.329
SCI               49.167    3.431     14.329
SS                63.054    4.209     14.981
CLASS 2
Means
WRITE             58.574    0.535    109.565
MATH              57.808    0.737     78.444
SCI               57.924    0.525    110.367
SS                57.437    0.594     96.667
Variances
WRITE             49.141    3.039     16.171
MATH              46.484    3.244     14.329
SCI               49.167    3.431     14.329
SS                63.054    4.209     14.981

#4. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART

Class 1

Thresholds
SES$1 -0.553 0.134 -4.131 SES$2              1.684    0.185      9.123

Class 2

Thresholds
SES$1 -2.005 0.202 -9.914 SES$2              0.550    0.124      4.428

#5. For categorical variables, we do not estimate means but instead we estimate thresholds.  In the prior example we imagined this to be like a logistic regression, but this is a 3 level ordinal variable so we would not think of the thresholds (or cut points) like we do with logistic regression.

------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART

Means
C#1               -0.168    0.145     -1.160

#6. These are much the same as with Example #1, see that example for more details.

------------------------------------------------------------------------------
LATENT CLASS INDICATOR MODEL PART IN PROBABILITY SCALE

Class 1

SES
Category 1         0.365    0.031     11.777
Category 2         0.478    0.031     15.263
Category 3         0.157    0.024      6.424

Class 2

SES
Category 1         0.119    0.021      5.612
Category 2         0.515    0.029     17.997
Category 3         0.366    0.029     12.699

#6. This takes the thresholds from section 5 of the output and converts them into probabilities.  So, if you are in class 1, your probability of being low SES (category 1) is .365, but if you are in class 2, your probability of being low SES (category 1) is .119.

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile ses read write math sci ss cprob1 cprob2 class using lca_ex4.txt

Below we show observations from the middle of this file.  Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, cprob3 is the probability of being in class 3, and class is the class membership based on the class with the highest probability.  Note that we don't see any folks in class 3 here, but there are members of class 3.

. list in 200/210

+-------------------------------------------------------------------+
| ses   read   write   math    sci     ss   cprob1   cprob2   class |
|-------------------------------------------------------------------|
200. |   2   46.9    52.1   42.5   47.7   60.5     .886     .114       1 |
201. |   0   46.9    51.5     57   49.8   40.6     .963     .037       1 |
202. |   2   46.9    52.8   49.3   53.1   35.6     .958     .042       1 |
203. |   1   46.9    43.7   41.9   41.7   35.6        1        0       1 |
204. |   0   46.9    61.9     53   52.6   60.5      .05      .95       2 |
|-------------------------------------------------------------------|
205. |   1   46.9    41.1   45.3   47.1   55.6     .998     .002       1 |
206. |   0   46.9    38.5   47.1   41.7   25.7        1        0       1 |
207. |   0   46.9    54.1   46.4   49.8   55.6     .938     .062       1 |
208. |   1   46.9    51.5   48.5   49.8   50.6      .93      .07       1 |
209. |   2   46.9    41.1   53.6   41.7   55.6     .989     .011       1 |
|-------------------------------------------------------------------|
210. |   0   46.9    61.9   46.2   60.7   45.6     .381     .619       2 |
+-------------------------------------------------------------------+

Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2, and likewise for class 3.  Note the correspondence between these means and the means from section 4 of the output.

. tabstat read write math sci ss [aw=cprob1], stat(mean)

stats |      read     write      math       sci        ss
---------+--------------------------------------------------
mean |   43.8374  45.06437  44.80045  44.47711  45.66875
------------------------------------------------------------

. tabstat read write math sci ss [aw=cprob2], stat(mean)

stats |      read     write      math       sci        ss
---------+--------------------------------------------------
mean |   58.7205  58.57446  57.80871  57.92401   57.4375
------------------------------------------------------------

Likewise, consider these ologit commands predicting ses using an empty model, but weighting the cases according to their class membership (class 1 then class 2).  Note the correspondence between the cut points below and the thresholds in section 5 of the output.

. ologit ses [aw=cprob1]
------------------------------------------------------------------------------
ses |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
-------------+----------------------------------------------------------------
_cut1 |  -.5526053   .1013389          (Ancillary parameters)
_cut2 |   1.683901   .1342721
------------------------------------------------------------------------------

. ologit ses [aw=cprob2]

------------------------------------------------------------------------------
ses |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
-------------+----------------------------------------------------------------
_cut1 |  -2.004701   .1460688          (Ancillary parameters)
_cut2 |   .5498481   .0980846
------------------------------------------------------------------------------


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.