|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
These code fragments are examples that we are using to try and understand these techniques using Mplus. We ask that you treat them as works in progress that explore these techniques, rather than definitive answers as to how to analyze any particular kind of data.
Consider the file Stata file hsb6 that has 600 observations with information about students like their reading, writing, math and other achievement scores. For the variables locus concept mot read-ss we will make a binary variable called hi___ that is 1 if the score is over the median, and 0 if below the median. This will be useful when we need a binary variable. Here we read the data from Stata, make the binary version of the file, compress it, and then conver the file to mplus using stata2mplus .
use http://www.ats.ucla.edu/stat/mplus/code/hsb6, clear
foreach varname of varlist locus concept mot read-ss {
summarize `varname', detail
generate hi`varname' = `varname' > `r(p50)'
}
compress
save hsb6, replace
stata2mplus using hsb6
We now have the input file hsb6.inp and the data file it reads called hsb6.dat .
Example 1: A latent class analysis with 2 classes, and continuous indicators
Here is the input file
Data:
File is I:\mplus\hsb6.dat ;
Variable:
Names are
id gender race ses sch prog locus concept mot career read write math
sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
Usevariables are
read write math sci ss ;
classes = c(2);
Analysis:
Type=mixture;
MODEL:
%C#1%
[read math sci ss write * 30 ];
%C#2%
[read math sci ss write * 60];
OUTPUT:
TECH8;
SAVEDATA:
file is lca_ex1.txt ;
save is cprob;
format is free;
Here is the output
------------------------------------------------------------------------------ FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 274.09000 0.45682 Class 2 325.91000 0.54318
#1. One way to view the second column is the average probability of falling into class 1 and class 2. As a result column 1 is the average probability times 600 (see stata example below for comparison).
A second way to view the second column is by taking each persons probability of falling into a class, and summing them. If person #6 has a .8 estimated probability of being in class 1, and .2 of being in class 2, then that person contributes .8 to class 1 and .2 to class 2. This is why these are these are fractional (see stata example below for comparison).
A third way of viewing this is that there is an underlying continuum of the latent variable, and there is a threshold for being categorized as class 1 or class 2, and that threshold can be used to compute the probabilities of being in the classes, see section #5
------------------------------------------------------------------------------ CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 272 0.45333 Class 2 328 0.54667
#2. This shows the count of people who fall into each class by taking their probability of membership in each class and assigning them to the class which they have the highest probability of falling into. Note the counts are exact whole numbers.
------------------------------------------------------------------------------
Average Class Probabilities by Class
1 2
Class 1 0.957 0.043
Class 2 0.042 0.958
#3. This is related to the output in #1, but takes the probabilities of class membership and averages them by class, see Stata portion below for more on this.
------------------------------------------------------------------------------
MODEL RESULTS
Estimates S.E. Est./S.E.
CLASS 1
Means
READ 43.783 0.642 68.152
WRITE 45.068 0.730 61.738
MATH 44.794 0.469 95.540
SCI 44.446 0.740 60.051
SS 45.574 0.658 69.237
Variances
READ 46.463 2.785 16.681
WRITE 49.427 3.011 16.415
MATH 46.634 3.133 14.884
SCI 49.022 3.388 14.470
SS 62.216 4.109 15.141
CLASS 2
Means
READ 58.730 0.605 97.000
WRITE 58.538 0.497 117.764
MATH 57.782 0.687 84.120
SCI 57.917 0.499 116.079
SS 57.488 0.589 97.629
Variances
READ 46.463 2.785 16.681
WRITE 49.427 3.011 16.415
MATH 46.634 3.133 14.884
SCI 49.022 3.388 14.470
SS 62.216 4.109 15.141
#4. This shows the average on the scores for the two classes. Class 1 is a low performing group, and class 2 is a high performing group.
------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART
Means
C#1 -0.173 0.133 -1.298
#5. This is the threshold for dividing the two classes. If you are below the threshhold, you are class 1, above it and you are class 2. We see the threshold is -0.173. Say that we then convert this threshold to a probability like this.
Prob(class 1) = 1/(1 + exp(-threshold1)) = 1 / ( 1 + exp( 0.173)) = .4568 (compare to section 1 above).
Prob(class 2) = 1 - 1/(1 + exp(-threshold1)) = 1 - 1 / ( 1 + exp( 0.173)) = .54314 (compare to section 1 above).
------------------------------------------------------------------------------
We now read the saved data file into Stata for comparison to the Mplus output.
infile read write math sci ss cprob1 cprob2 class using lca_ex1.txt
Below we show the first observations from the middle of this file. Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, and class is the class membership based on the class with the highest probability.
. list in 200/210
+-------------------------------------------------------------+
| read write math sci ss cprob1 cprob2 class |
|-------------------------------------------------------------|
200. | 46.9 52.1 42.5 47.7 60.5 .944 .056 1 |
201. | 46.9 51.5 57 49.8 40.6 .9 .1 1 |
202. | 46.9 52.8 49.3 53.1 35.6 .983 .017 1 |
203. | 46.9 43.7 41.9 41.7 35.6 1 0 1 |
204. | 46.9 61.9 53 52.6 60.5 .016 .984 2 |
|-------------------------------------------------------------|
205. | 46.9 41.1 45.3 47.1 55.6 .998 .002 1 |
206. | 46.9 38.5 47.1 41.7 25.7 1 0 1 |
207. | 46.9 54.1 46.4 49.8 55.6 .827 .173 1 |
208. | 46.9 51.5 48.5 49.8 50.6 .934 .066 1 |
209. | 46.9 41.1 53.6 41.7 55.6 .995 .005 1 |
|-------------------------------------------------------------|
210. | 46.9 61.9 46.2 60.7 45.6 .17 .83 2 |
+-------------------------------------------------------------+
Note that if we tabulate class we see where the values from section #2 of the output came from.
. tab class
class | Freq. Percent Cum.
------------+-----------------------------------
1 | 272 45.33 45.33
2 | 328 54.67 100.00
------------+-----------------------------------
Total | 600 100.00
Note that if we take the average of cprob1 and cprob2, we can relate these values to column 2 of section #1 of the output.
. summ cprob1 cprob2
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cprob1 | 600 .4568233 .4664192 0 1
cprob2 | 600 .5431767 .4664192 0 1
If we sum the probabilities, we can relate these to column 1 of section #1 of the output.
. tabstat cprob1 cprob2, stat(sum)
stats | cprob1 cprob2
---------+--------------------
sum | 274.094 325.906
------------------------------
If we average the probabilities by class, we can relate these values to section #3 of the output.
. tabstat cprob1 cprob2, by(class)
Summary statistics: mean
by categories of: class
class | cprob1 cprob2
---------+--------------------
1 | .9570699 .0429301
2 | .0419848 .9580152
---------+--------------------
Total | .4568233 .5431767
------------------------------
Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2. Note the correspondence between these means and the means from section 4 of the output.
. tabstat read write math sci ss [aw=cprob1], stat(mean)
stats | read write math sci ss
---------+--------------------------------------------------
mean | 43.78268 45.06829 44.79421 44.44601 45.5743
------------------------------------------------------------
. tabstat read write math sci ss [aw=cprob2], stat(mean)
stats | read write math sci ss
---------+--------------------------------------------------
mean | 58.73021 58.53821 57.78224 57.91736 57.48822
------------------------------------------------------------
Example 2: A latent class analysis with 3 classes, and continuous indicators
Here is the input file
Data:
File is I:\mplus\hsb6.dat ;
Variable:
Names are
id gender race ses sch prog locus concept mot career read write math
sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic;
Usevariables are
read write math sci ss ;
classes = c(3);
Analysis:
Type=mixture;
MODEL:
%C#1%
[read math sci ss write *30 ];
%C#2%
[read math sci ss write *45];
%C#3%
[read math sci ss write *60];
OUTPUT:
TECH8;
SAVEDATA:
file is lca_ex2.txt ;
save is cprob;
format is free;
Here is the output
------------------------------------------------------------------------------ FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 194.55375 0.32426 Class 2 252.39798 0.42066 Class 3 153.04826 0.25508
#1. One way to view the second column is the average probability of falling into class 1 and class 2. As a result column 1 is the average probability times 600 (see stata example below for comparison).
A second way to view the second column is by taking each persons probability of falling into a class, and summing them. If person #6 has a .8 estimated probability of being in class 1, and .2 of being in class 2, then that person contributes .8 to class 1 and .2 to class 2. This is why these are these are fractional (see stata example below for comparison).
A third way of viewing this is that there is an underlying continuum of the latent variable, and there is a threshold for being categorized as class 1 or class 2. If you are below the threshhold, you are class 1, above it and you are class 2. Below we see the threshold is -0.173. Say that we then convert this threshold to a probability, exp( -0.173)/ ( 1 + exp( -0.173)) = .4568 (compare to above).
------------------------------------------------------------------------------ CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 197 0.32833 Class 2 249 0.41500 Class 3 154 0.25667
#2. This shows the count of people who fall into each class by taking their probability of membership in each class and assigning them to the class which they have the highest probability of falling into. Note the counts are exact whole numbers.
------------------------------------------------------------------------------
Average Class Probabilities by Class
1 2 3
Class 1 0.940 0.060 0.000
Class 2 0.038 0.912 0.050
Class 3 0.000 0.087 0.913
#3. This is related to the output in section #1, but takes the probabilities of class membership and averages them by class, see Stata portion below for more on this.
------------------------------------------------------------------------------
MODEL RESULTS
Estimates S.E. Est./S.E.
CLASS 1
Means
READ 41.735 0.477 87.540
WRITE 42.703 0.962 44.390
MATH 43.178 0.516 83.648
SCI 42.160 0.663 63.625
SS 43.848 0.695 63.097
Variances
READ 32.997 2.820 11.699
WRITE 42.369 3.775 11.223
MATH 34.562 2.422 14.269
SCI 38.395 2.714 14.146
SS 53.884 3.850 13.996
CLASS 2
Means
READ 52.618 0.925 56.866
WRITE 54.507 0.727 74.938
MATH 52.008 0.835 62.319
SCI 53.172 0.835 63.680
SS 52.794 0.808 65.324
Variances
READ 32.997 2.820 11.699
WRITE 42.369 3.775 11.223
MATH 34.562 2.422 14.269
SCI 38.395 2.714 14.146
SS 53.884 3.850 13.996
CLASS 3
Means
READ 63.644 0.948 67.117
WRITE 61.193 0.453 135.170
MATH 62.610 0.865 72.404
SCI 61.648 0.667 92.451
SS 61.232 0.758 80.759
Variances
READ 32.997 2.820 11.699
WRITE 42.369 3.775 11.223
MATH 34.562 2.422 14.269
SCI 38.395 2.714 14.146
SS 53.884 3.850 13.996
#4. This shows the average on the scores for the two classes. Class 1 is a low performing group, and class 2 is a medium performing group, and class 3 is a high performing group.
------------------------------------------------------------------------------
LATENT CLASS REGRESSION MODEL PART
Means
C#1 0.240 0.218 1.099
C#2 0.500 0.181 2.766
#5. This is the threshold for dividing the three classes. Note that this is now like a multinomial logistic regression, where the thresholds divide three multinomial categories, with class 3 being the reference category and C#1 is the threshold for being in class 1 as compared to class 3, and C#2 is the threshold for being in class 2 as compared to class 3. For the comparison group, class 3, the probability of being in that class is computed as below, letting "t1" be threshold 1 (.24) and "t2" be threshold 2 (.5).
P(class=3) = 1 / (1 + exp(t1) + exp(t2)) = 1 / (1 + exp(.24) + exp(.5)) = .25510397 .
For classes 1 and 2, the formula is a bit different since these are not the comparison class. For class 1, the formula is
P(class=1) = exp(t1) / (1 + exp(t1) + exp(t2)) = exp(.24) / (1 + exp(.24) + exp(.5)) = .32430071.
For class 2, the formula is
P(class=2) = exp(t2) / (1 + exp(t1) + exp(t2)) = exp(.5) / (1 + exp(.24) + exp(.5)) = .42059533.
------------------------------------------------------------------------------
We now read the saved data file into Stata for comparison to the Mplus output.
infile read write math sci ss cprob1 cprob2 cprob3 class using lca_ex2.txt
Below we show observations from the middle of this file. Note that cprob1 is the probability of being in class 1 and cprob2 is the probability of being in class 2, cprob3 is the probability of being in class 3, and class is the class membership based on the class with the highest probability. Note that we don't see any folks in class 3 here, but there are members of class 3.
. list in 200/210
+----------------------------------------------------------------------+
| read write math sci ss cprob1 cprob2 cprob3 class |
|----------------------------------------------------------------------|
200. | 46.9 52.1 42.5 47.7 60.5 .133 .867 0 2 |
201. | 46.9 51.5 57 49.8 40.6 .062 .938 0 2 |
202. | 46.9 52.8 49.3 53.1 35.6 .228 .772 0 2 |
203. | 46.9 43.7 41.9 41.7 35.6 .998 .002 0 1 |
204. | 46.9 61.9 53 52.6 60.5 0 .996 .004 2 |
|----------------------------------------------------------------------|
205. | 46.9 41.1 45.3 47.1 55.6 .812 .188 0 1 |
206. | 46.9 38.5 47.1 41.7 25.7 1 0 0 1 |
207. | 46.9 54.1 46.4 49.8 55.6 .039 .961 0 2 |
208. | 46.9 51.5 48.5 49.8 50.6 .1 .9 0 2 |
209. | 46.9 41.1 53.6 41.7 55.6 .709 .291 0 1 |
|----------------------------------------------------------------------|
210. | 46.9 61.9 46.2 60.7 45.6 .001 .999 0 2 |
+----------------------------------------------------------------------+
Note that if we tabulate class we see where the values from section #2 of the output came from.
. tab class
class | Freq. Percent Cum.
------------+-----------------------------------
1 | 197 32.83 32.83
2 | 249 41.50 74.33
3 | 154 25.67 100.00
------------+-----------------------------------
Total | 600 100.00
Note that if we take the average of cprob1, cprob2, and cprob3 we can relate these values to column 2 of section #1 of the output.
. summ cprob1 cprob2 cprob3
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
cprob1 | 600 .3242633 .440132 0 1
cprob2 | 600 .4206317 .4326395 0 .999
cprob3 | 600 .2550817 .3989861 0 1
If we sum the probabilities, we can relate these to column 1 of section #1 of the output.
. tabstat cprob1 cprob2 cprob3, stat(sum)
stats | cprob1 cprob2 cprob3
---------+------------------------------
sum | 194.558 252.379 153.049
----------------------------------------
If we average the probabilities by class, we can relate these values to section #3 of the output.
. tabstat cprob1 cprob2 cprob3, by(class)
Summary statistics: mean
by categories of: class
class | cprob1 cprob2 cprob3
---------+------------------------------
1 | .9401117 .0598883 0
2 | .0375743 .9123735 .049996
3 | 0 .087013 .912987
---------+------------------------------
Total | .3242633 .4206317 .2550817
----------------------------------------
Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2, and likewise for class 3. Note the correspondence between these means and the means from section 4 of the output.
. tabstat read write math sci ss [aw=cprob1], stat(mean)
stats | read write math sci ss
---------+--------------------------------------------------
mean | 41.73485 42.70297 43.17746 42.16013 43.84801
------------------------------------------------------------
. tabstat read write math sci ss [aw=cprob2], stat(mean)
stats | read write math sci ss
---------+--------------------------------------------------
mean | 52.61804 54.50678 52.00815 53.17197 52.79395
------------------------------------------------------------
. tabstat read write math sci ss [aw=cprob3], stat(mean)
stats | read write math sci ss
---------+--------------------------------------------------
mean | 63.64527 61.19303 62.61002 61.6482 61.2325
------------------------------------------------------------
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services