UCLA Academic Technology Services HomeServicesClassesContactJobs

Mplus Data Analysis Examples
Multinomial Logistic Regression

NOTE: This example was done using Mplus version 4.21. The syntax may not work with earlier versions of Mplus.

Examples

Example 1. People's occupational choices might be influenced by their parents' occupations and their own education level. We can study the relationship of one's occupation choice with education level and father's occupation.  The occupational choices will be the outcome variable which consists of categories of occupations.

Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have difference preference than young ones. The outcome variable here will be the types of food, and the predictor variables might be the length of the alligators and other environmental variables.

Example 3. Several brands of similar products are on the market, and you want to study brand choices based on gender and age. For example, a recent finding of a market research group claims that among digital camera choices, women prefer Kodak more than men and men prefer Canon more than women.

Description of the Data

For our data analysis example, we will expand our third example with a hypothetical data set. The data set contains information on 735 subjects who were asked their preference on three brands of some product (e.g., car or TV).  Included in the data set is information on subjects' gender and age. You can download the data set here.

The outcome variable is brand. The variable female is coded as 0 for male and 1 for female. The variable age is the respondent's age in years. Let's start with some descriptive statistics for the variables of interest.

Data:
  File is D:\documents\mlogit in Mplus DAE\mlogit.dat ;
Variable:
  Names are 
    brand female age;
	categorical are female;
Analysis: 
  Type = basic;
Plot:
    type = plot1;

For this output only, we will display all of the information in the output. You will want to look at the output from this syntax carefully to be sure that the data were read into Mplus correctly. You will want to make sure that you have the correct number of observations, and that any binary or ordinal variables have been declared as such using the "categorical are" statement. Type=basic is not available for nominal variables, so we have not included "nominal are brand" in our model statement, however, we will need to do this later in order to get a multinomial logistic regression model. We have not used a missing statement because we have no missing data in this data set. Note that the under "MEANS/INTERCEPTS/THRESHOLDS" the value for categorical variables will not be the same as the mean of that variable as displayed in other stat packages, however, the proportions listed above that section will be correct if your data has been read in properly.

INPUT READING TERMINATED NORMALLY

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         735

Number of dependent variables                                    3
Number of independent variables                                  0
Number of continuous latent variables                            0

Observed dependent variables

  Continuous
   BRAND       AGE

  Binary and ordered categorical (ordinal)
   FEMALE


Estimator                                                    WLSMV
Maximum number of iterations                                  1000
Convergence criterion                                    0.500D-04
Maximum number of steepest descent iterations                   20
Parameterization                                             DELTA

Input data file(s)
  D:\documents\mlogit in Mplus DAE\mlogit.dat

Input data format  FREE


SUMMARY OF CATEGORICAL DATA PROPORTIONS

    FEMALE
      Category 1    0.366
      Category 2    0.634


RESULTS FOR BASIC ANALYSIS


     ESTIMATED SAMPLE STATISTICS


           MEANS/INTERCEPTS/THRESHOLDS
              BRAND         FEMALE$1      AGE
              ________      ________      ________
      1         2.019        -0.343        32.901


           CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)
              BRAND         FEMALE        AGE
              ________      ________      ________
 BRAND          0.582
 FEMALE         0.090
 AGE            0.461         0.010         5.436


     STANDARD ERRORS FOR ESTIMATED SAMPLE STATISTICS


           S.E. FOR MEANS/INTERCEPTS/THRESHOLDS
              BRAND         FEMALE$1      AGE
              ________      ________      ________
      1         0.028         0.047         0.086


           S.E. FOR CORRELATION MATRIX (WITH VARIANCES ON THE DIAGONAL)
              BRAND         FEMALE        AGE
              ________      ________      ________
 BRAND          0.051
 FEMALE         0.046
 AGE            0.024         0.046         0.261


PLOT INFORMATION

The following plots are available:

  Histograms (sample values)
  Scatterplots (sample values)

Using the plot generator in Mplus, we can also get the following histograms for each of our variables

Some Strategies You Might Try

Using the Multinomial Logit Model

Now we have warmed up to building our model. Our goal is to associate the brand choices with age and gender. We will assume a linear relationship between the transformed outcome variable and our predictor variables female and age. In the multinomial logit model, one group is used as the "reference group" (also called a base category), and the coefficients for all other outcome groups describe how the independent variables are related to the probability of being in that group versus the reference group. Mplus automatically uses the last category of the dependent variable as the base category or comparison group. Looking at the syntax below, in the model statement we have entered "brand#1 brand#2 on female age." Mplus uses a variable name followed by a pound sign and a number to refer to the categories of the nominal dependent variable, this holds for all categories of the dependent variable except the final category, which is the reference group and cannot be referred to in the model statement (if you try, Mplus will issue an error message). Thus the line included in our model statement indicates that we want to regress both levels of brand on female and age, that is, we want to use female and age to predict the probability of being in category 1 versus category 3 of brand, and to predict the probability of being in category 2 of brand versus category 3 of brand. This is often simply described as regressing brand on female and age.

Data:
    File is "D:\documents\mlogit in Mplus DAE\mlogit.dat" ;
  Variable:
    Names are
       brand female age;
    categorical are female ;
    nominal are brand;
  Analysis:
    Type = general ;
  Model:
      brand#1 brand#2 on female age;

SUMMARY OF ANALYSIS

Number of groups                                                 1
Number of observations                                         400

output omitted


TESTS OF MODEL FIT

Loglikelihood

          H0 Value                        -702.971
          H0 Scaling Correction Factor       1.031
            for MLR

Information Criteria

          Number of Free Parameters              6
          Akaike (AIC)                    1417.941
          Bayesian (BIC)                  1445.541
          Sample-Size Adjusted BIC        1426.489
            (n* = (n + 2) / 24)



MODEL RESULTS

                   Estimates     S.E.  Est./S.E.

 BRAND#1    ON
    FEMALE            -0.466    0.227     -2.057
    AGE               -0.686    0.072     -9.497

 BRAND#2    ON
    FEMALE             0.058    0.196      0.296
    AGE               -0.318    0.046     -6.882

 Intercepts
    BRAND#1           22.721    2.378      9.554
    BRAND#2           10.947    1.571      6.969

Note that some of the output (e.g. summary of analysis and the convergence log) has been omitted to save space. At the top of the output we see that all 400 observations in our data set were used in the analysis. If any of our variables had missing data we would have needed to specify "missing = #" in the variable statement, where # is the numeric value given to missing values (e.g. -9999). By default Mplus will exclude cases with missing values on any of the variables in our analysis, and hence missing data will result in fewer observations being used. However, in Mplus there are other (good) options for handling missing data. We won't discuss them here, except to say that they are available, and are one of the reasons one might consider running this sort of analysis in Mplus (since many other packages can be used to run an ordinal logistic regression model). The final log likelihood (-702.971) can be used in comparisons of nested models, but we won't show an example of that here. Under the heading "Information Criteria" we see the Akaike and Bayesian information criterion values. Both the AIC and the BIC are measures of fit with some correction for the complexity of the model, but the BIC has a stronger correction for parsimony. In both cases, lower values indicate better fit of the model.

Next we see the model results. This part of the output above has two parts, labeled with the categories of the outcome variable brand. They correspond to two equations:

log(P(brand=1)/P(brand=3)) = b_10 + b_11*female + b_12*age
log(P(brand=2)/P(brand=3)) = b_20 + b_21*female + b_22*age,

with b's being the raw regression coefficients from the output.

For example, we can say that for one unit change in the variable age, the log of the ratio of the two probabilities, P(brand=1)/P(brand=3), will decreased by 0.686 (i.e. -0.686), and the log of the ratio of the two probabilities P(brand=2)/P(brand=3) decreased by 0.318 (i.e. -0.318). Therefore, we can say that, in general, the older a person is, the less likely he/she is to prefer brands 1 and 2 over brand 3 (and conversely, that the younger someone is, the more likely they are to prefer brands 1 and 2 over brand 3).

The ratio of the probability of choosing one outcome category over the probability of choosing the reference category is often referred as relative risk (and it is also sometimes referred as odds).  So another way of interpreting the regression results is in terms of relative risk. We can say that for one unit change in the variable age, we expect the relative risk of choosing brand 1 over 3 to increase by exp(-0.686) = 0.504. So we can say that the relative risk is lower for older people. For a dichotomous predictor variable such as female, we can say that the ratio of the relative risks of choosing brand 1 over 3 for females versus  males is exp(-0.466)=.628. (Note that Mplus does not calculate these for you, but performing the calculations is simple.)

Sample Write-up of the Analysis

To start, we have created a table that looks similar to tables seen in some journals, containing the regression coefficients (with significance indicated by asterisks) and the standard errors of the coefficients.

                              Multinomial Model                    
                                 1               2   
female                        -0.466*        0.058
                              (0.227)       (0.196)
age                           -0.686***	    -0.318***
                              (0.072)       (0.046)
Intercept                     22.721***     10.947***   
                              (2.378)       (1.571)   

Presenting the multinomial regression results can be somewhat tricky since there are multiple equations and multiple comparisons to present. For example, the table above only shows the relative risk ratio for 1 versus 3 and 2 versus 3. How about 1 versus 2? As the number of the outcome categories increases, the possible number of comparisons will go up as well, and with much greater speed. Because table (and other) space is typically limited in journals, you will want to think carefully about which comparisons will be most interesting to your reader, and present those.

In more detailed fashion, we can say that holding all the other variables constant, the effect of female is 0.627 on the relative risk of choosing brand 1 over 3, meaning that the percent decrease of relative risk (or rather loosely, the odds) of choosing brand 1 over 3 from male to female is about 37 percent.

Cautions, Flies in the Ointment

Additional Examples

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.