|
|
|
||||
|
|
|||||
NOTE: This example was done using Mplus version 4.21. The syntax may not work with earlier versions of Mplus.
Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have difference preference than young ones. The outcome variable here will be the types of food, and the predictor variables might be the length of the alligators and other environmental variables.
Example 3. Several brands of similar products are on the market, and you want to study brand choices based on gender and age. For example, a recent finding of a market research group claims that among digital camera choices, women prefer Kodak more than men and men prefer Canon more than women.
For our data analysis example, we will expand our third example with a hypothetical data set. The data set contains information on 735 subjects who were asked their preference on three brands of some product (e.g., car or TV). Included in the data set is information on subjects' gender and age. You can download the data set here.
The outcome variable is brand. The variable female is coded as 0 for male and 1 for female. The variable age is the respondent's age in years. Let's start with some descriptive statistics for the variables of interest.
Data:
File is D:\documents\mlogit in Mplus DAE\mlogit.dat ;
Variable:
Names are
brand female age;
categorical are female;
Analysis:
Type = basic;
Plot:
type = plot1;
For this output only, we will display all of the information in the output. You will want to look at the output from this syntax carefully to be sure that the data were read into Mplus correctly. You will want to make sure that you have the correct number of observations, and that any binary or ordinal variables have been declared as such using the "categorical are" statement. Type=basic is not available for nominal variables, so we have not included "nominal are brand" in our model statement, however, we will need to do this later in order to get a multinomial logistic regression model. We have not used a missing statement because we have no missing data in this data set. Note that the under "MEANS/INTERCEPTS/THRESHOLDS" the value for categorical variables will not be the same as the mean of that variable as displayed in other stat packages, however, the proportions listed above that section will be correct if your data has been read in properly.
INPUT READING TERMINATED NORMALLY
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 735
Number of dependent variables 3
Number of independent variables 0
Number of continuous latent variables 0
Observed dependent variables
Continuous
BRAND AGE
Binary and ordered categorical (ordinal)
FEMALE
Estimator WLSMV
Maximum number of iterations 1000
Convergence criterion 0.500D-04
Maximum number of steepest descent iterations 20
Parameterization DELTA
Input data file(s)
D:\documents\mlogit in Mplus DAE\mlogit.dat
Input data format FREE
SUMMARY OF CATEGORICAL DATA PROPORTIONS
FEMALE
Category 1 0.366
Category 2 0.634
RESULTS FOR BASIC ANALYSIS
ESTIMATED SAMPLE STATISTICS
MEANS/INTERCEPTS/THRESHOLDS
BRAND FEMALE$1 AGE
________ ________ ________
1 2.019 -0.343 32.901
CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)
BRAND FEMALE AGE
________ ________ ________
BRAND 0.582
FEMALE 0.090
AGE 0.461 0.010 5.436
STANDARD ERRORS FOR ESTIMATED SAMPLE STATISTICS
S.E. FOR MEANS/INTERCEPTS/THRESHOLDS
BRAND FEMALE$1 AGE
________ ________ ________
1 0.028 0.047 0.086
S.E. FOR CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)
BRAND FEMALE AGE
________ ________ ________
BRAND 0.051
FEMALE 0.046
AGE 0.024 0.046 0.261
PLOT INFORMATION
The following plots are available:
Histograms (sample values)
Scatterplots (sample values)
Using the plot generator in Mplus, we can also get the following histograms for each of our variables
Now we have warmed up to building our model. Our goal is to associate the brand choices with age and gender. We will assume a linear relationship between the transformed outcome variable and our predictor variables female and age. In the multinomial logit model, one group is used as the "reference group" (also called a base category), and the coefficients for all other outcome groups describe how the independent variables are related to the probability of being in that group versus the reference group. Mplus automatically uses the last category of the dependent variable as the base category or comparison group. Looking at the syntax below, in the model statement we have entered "brand#1 brand#2 on female age." Mplus uses a variable name followed by a pound sign and a number to refer to the categories of the nominal dependent variable, this holds for all categories of the dependent variable except the final category, which is the reference group and cannot be referred to in the model statement (if you try, Mplus will issue an error message). Thus the line included in our model statement indicates that we want to regress both levels of brand on female and age, that is, we want to use female and age to predict the probability of being in category 1 versus category 3 of brand, and to predict the probability of being in category 2 of brand versus category 3 of brand. This is often simply described as regressing brand on female and age.
Data:
File is "D:\documents\mlogit in Mplus DAE\mlogit.dat" ;
Variable:
Names are
brand female age;
categorical are female ;
nominal are brand;
Analysis:
Type = general ;
Model:
brand#1 brand#2 on female age;
SUMMARY OF ANALYSIS
Number of groups 1
Number of observations 400
output omitted
TESTS OF MODEL FIT
Loglikelihood
H0 Value -702.971
H0 Scaling Correction Factor 1.031
for MLR
Information Criteria
Number of Free Parameters 6
Akaike (AIC) 1417.941
Bayesian (BIC) 1445.541
Sample-Size Adjusted BIC 1426.489
(n* = (n + 2) / 24)
MODEL RESULTS
Estimates S.E. Est./S.E.
BRAND#1 ON
FEMALE -0.466 0.227 -2.057
AGE -0.686 0.072 -9.497
BRAND#2 ON
FEMALE 0.058 0.196 0.296
AGE -0.318 0.046 -6.882
Intercepts
BRAND#1 22.721 2.378 9.554
BRAND#2 10.947 1.571 6.969
Note that some of the output (e.g. summary of analysis and the convergence log) has been omitted to save space. At the top of the output we see that all 400 observations in our data set were used in the analysis. If any of our variables had missing data we would have needed to specify "missing = #" in the variable statement, where # is the numeric value given to missing values (e.g. -9999). By default Mplus will exclude cases with missing values on any of the variables in our analysis, and hence missing data will result in fewer observations being used. However, in Mplus there are other (good) options for handling missing data. We won't discuss them here, except to say that they are available, and are one of the reasons one might consider running this sort of analysis in Mplus (since many other packages can be used to run an ordinal logistic regression model). The final log likelihood (-702.971) can be used in comparisons of nested models, but we won't show an example of that here. Under the heading "Information Criteria" we see the Akaike and Bayesian information criterion values. Both the AIC and the BIC are measures of fit with some correction for the complexity of the model, but the BIC has a stronger correction for parsimony. In both cases, lower values indicate better fit of the model.
Next we see the model results. This part of the output above has two parts, labeled with the categories of the outcome variable brand. They correspond to two equations:
log(P(brand=1)/P(brand=3)) = b_10 + b_11*female + b_12*age
log(P(brand=2)/P(brand=3)) = b_20 + b_21*female + b_22*age,
with b's being the raw regression coefficients from the output.
For example, we can say that for one unit change in the variable age, the log of the ratio of the two probabilities, P(brand=1)/P(brand=3), will decreased by 0.686 (i.e. -0.686), and the log of the ratio of the two probabilities P(brand=2)/P(brand=3) decreased by 0.318 (i.e. -0.318). Therefore, we can say that, in general, the older a person is, the less likely he/she is to prefer brands 1 and 2 over brand 3 (and conversely, that the younger someone is, the more likely they are to prefer brands 1 and 2 over brand 3).
The ratio of the probability of choosing one outcome category over the probability of choosing the reference category is often referred as relative risk (and it is also sometimes referred as odds). So another way of interpreting the regression results is in terms of relative risk. We can say that for one unit change in the variable age, we expect the relative risk of choosing brand 1 over 3 to increase by exp(-0.686) = 0.504. So we can say that the relative risk is lower for older people. For a dichotomous predictor variable such as female, we can say that the ratio of the relative risks of choosing brand 1 over 3 for females versus males is exp(-0.466)=.628. (Note that Mplus does not calculate these for you, but performing the calculations is simple.)
To start, we have created a table that looks similar to tables seen in some journals, containing the regression coefficients (with significance indicated by asterisks) and the standard errors of the coefficients.
Multinomial Model
1 2
female -0.466* 0.058
(0.227) (0.196)
age -0.686*** -0.318***
(0.072) (0.046)
Intercept 22.721*** 10.947***
(2.378) (1.571)
Presenting the multinomial regression results can be somewhat tricky since there are multiple equations and multiple comparisons to present. For example, the table above only shows the relative risk ratio for 1 versus 3 and 2 versus 3. How about 1 versus 2? As the number of the outcome categories increases, the possible number of comparisons will go up as well, and with much greater speed. Because table (and other) space is typically limited in journals, you will want to think carefully about which comparisons will be most interesting to your reader, and present those.
In more detailed fashion, we can say that holding all the other variables constant, the effect of female is 0.627 on the relative risk of choosing brand 1 over 3, meaning that the percent decrease of relative risk (or rather loosely, the odds) of choosing brand 1 over 3 from male to female is about 37 percent.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services