SPSS Data Analysis Examples
Multinomial
Logistic Regression
Examples
Example 1. People's occupational choices might be influenced
by their parents' occupations and their own education level. We can study the
relationship of one's occupation choice with education level and father's
occupation. The occupational choices will be the outcome variable which
consists of categories of occupations.
Example 2. A biologist may be
interested in food choices that alligators make. Adult alligators might have
difference preference than young ones. The outcome variable here will be the
types of food, and the predictor variables might be the length of the alligators
and other environmental variables.
Example 3. Several brands of similar products are on the market, and you want
to study brand choices based on gender and age. For example, a recent finding of a
market research group claims that among digital camera choices, women prefer
Kodak more than men and men prefer Canon more than women.
Description of the Data
For our data analysis example, we will expand our third example with a
hypothetical data set. The data set contains information on 735 subjects who
were asked their
preference on three brands of some product (e.g., car or TV). Included in
the data set are the information on subjects' gender and age. You can download the data
here .
get file "D:\mlogit.sav".
list
/cases from 1 to 25.
brand female age
1 0 24
1 0 26
1 0 26
1 1 27
1 1 27
3 1 27
1 0 27
1 0 27
1 1 27
1 0 27
1 0 27
1 1 27
2 1 28
3 1 28
1 1 28
1 0 28
1 0 28
2 1 28
1 0 28
1 0 28
1 1 28
1 1 28
3 0 28
1 1 28
3 0 28
Number of cases read: 25 Number of cases listed: 25
The outcome variable is brand. The variable female is coded as
0 for male and 1 for female. Let's start with some descriptive statistics of the
variables of our interest.
frequencies var = brand.


sort cases by brand.
temporary.
split file by brand.
descriptives var = age female.
split file off.

Some Strategies You Might Try
- Multiple logistic regression analyses, one for each pair of outcomes:
One problem with this approach is that each analysis is run on a different
sample. The other problem is that without constraining the logistic models,
we can end up with the probability of choosing all possible outcome categories
greater than 1.
- Collapsing number of categories to two and then doing a logistic regression: This approach
suffers from loss of information and changes the original research questions to
very different ones.
- Ordinal logistic regression: If the outcome variable is truly ordered
and if it also satisfies the assumption of proportional
odds, then switching to ordinal logistic regression will make the model more
parsimonious.
- Linear regression: Sometimes the coding of a variable can be deceiving,
making the outcome variable look like a continuous variable. This approach
does not have any merit to it when the outcome variable is truly
multinomial.
- Multiple Discriminant Analysis: This is a multivariate technique of
profiling, differentiation and classification of groups. The goals are
different from multinomial logistic regression (MLR) in that MLR is more
about
description, inference and prediction.
Using the Multinomial Logit Model
Now we have warmed up to building our model. Our goal is to associate the brand
choices with age and gender. We will assume a linear
relationship between the transformed outcome variable and our predictor
variables female and age. Since there are multiple categories, we
will choose a base category as the comparison group. Here our choice is the
first brand (brand=1).
nomreg brand (base = first) with female age
/print = lrt cps mfi parameter summary.





The table above, titled Parameter Estimates, has two parts, labeled with the categories of the
outcome variable brand. They correspond to two equations:
log(P(brand=2)/P(brand=1)) = b_10 + b_11*female + b_12*age
log(P(brand=3)/P(brand=1)) = b_20 + b_21*female + b_22*age,
with b's being the raw regression coefficients from the output.
For example,
we can say that for one unit change in the variable age, the log of the
ratio of the two probabilities, P(brand=2)/P(brand=1), will be increased by 0.368, and the log of
the ratio of the two probabilities P(brand=3)/P(brand=1) will be increased by 0.686. Therefore,
we can say that, in general, the older a person is, the more he/she will prefer
brand 2 or 3.
The ratio of the probability of choosing one outcome category over the
probability of choosing the reference category is often referred as relative
risk (and it is also sometimes referred as odds). So another way of interpreting
the regression results is in terms of relative risk. We can say that for one
unit change in the variable age, we expect the relative risk of choosing
brand 2 over 1 to increase by exp(.3682) = 1.45. So we can say that the relative
risk is higher for older people. For a dichotomous predictor variable such as
female, we can say that the ratio of the relative risks of choosing brand 2
over 1 for female and male is exp(.5238). We can see the results displayed as
relative risk ratios in the column labeled Exp(B) in the table above.
Sample Write-up of the Analysis
Below is one way of describing the results.
Both female and age are statistically significant across the
two models. Females are more likely to prefer brands 2 or 3 compared to
brand 1. Also, the older a person is, the more likely he/she is to prefer
brands 2 or 3 to brand 1. Both of these findings are statistically
significant.
Cautions, Flies in the Ointment
- The Independence of Irrelevant Alternatives (IIA) assumption: Roughly,
the IIA assumption means that adding or deleting alternative outcome
categories does not affect the odds among the remaining outcomes.
- Diagnostics and model fit: Unlike logistic regression where there are
many statistics for performing model diagnostics, it is not as
straightforward to do diagnostics with multinomial logistic regression
models.
- Pseudo-R-Squared: The R-squares offered in the output are basically the
change in terms of log-likelihood from the intercept-only model to the
current model. These do not convey the same information as the R-square for
linear regression, even though it is still "the higher, the better".
- Sample size: Multinomial regression uses a maximum likelihood estimation
method. Therefore, it requires a large sample size. It also uses multiple
equations. Therefore, it requires an even larger sample size than ordinal or
binary logistic regression.
- Perfect prediction:
Perfect prediction means that only one value of a predictor variable is
associated with only one value of the response variable. You can tell from the output of the
regression coefficients that something is wrong. You can then do a two-way tabulation of the outcome
variable with the problematic variable to confirm this and then rerun the model
without the problematic variable.
- Empty cells or small cells: You should check for empty or small
cells by doing a crosstab between categorical predictors and
the outcome variable. If a cell has very few cases (a small cell), the
model may become unstable or it might not run at all.
See Also
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California.