SPSS Data Analysis Examples
Multinomial Logistic Regression

Version info: Code for this page was tested in SPSS 20.

Multinomial logistic regression is used to model nominal outcome variables, in which the log odds of the outcomes are modeled as a linear combination of the predictor variables.

Please note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do. In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics and potential follow-up analyses.

Examples of multinomial logistic regression

Example 1. People's occupational choices might be influenced by their parents' occupations and their own education level. We can study the relationship of one's occupation choice with education level and father's occupation.  The occupational choices will be the outcome variable which consists of categories of occupations.

Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have difference preference than young ones. The outcome variable here will be the types of food, and the predictor variables might be the length of the alligators and other environmental variables.

Example 3. Entering high school students make program choices among general program, vocational program and academic program. Their choice might be modeled using their writing score and their social economic status.

Description of the data

For our data analysis example, we will expand the third example using the hsbdemo data set. You can download the data here.

The data set contains variables on 200 students. The outcome variable is prog, program type. The predictor variables are social economic status, ses, a three-level categorical variable and writing score, write, a continuous variable. Let's start with getting some descriptive statistics of the variables of interest.


crosstabs
  /tables=prog by ses
  /statistics=chisq 
  /cells=count.

sort cases by prog.
split file by prog.
descriptives var = write 
/statistics = mean stddev.
split file off.

Analysis methods you might consider

Using the multinomial logit model

Below we use the nomreg command to estimate a multinomial logistic regression model. We specify the baseline comparison group to be the academic group using (base=2).


nomreg prog (base = 2) by ses with write
/print = lrt cps mfi parameter summary.

\[ln\left(\frac{P(prog=general)}{P(prog=academic)}\right) = b_{10} + b_{11}(ses=1) + b_{12}(ses=2) + b_{13}write\] \[ln\left(\frac{P(prog=vocation)}{P(prog=academic)}\right) = b_{20} + b_{21}(ses=1) + b_{22}(ses=2) + b_{23}write\] where \(b\)'s are the regression coefficients.

The ratio of the probability of choosing one outcome category over the probability of choosing the baseline category is often referred to as relative risk (and it is also sometimes referred to as odds as we have just used to described the regression parameters above).  Thus, exponentiating the linear equations above yields relative risks. Regression coefficients represent the change in log relative risk (log odds) per unit change in the predictor. Exponentiating regression coefficients will therefore yield relative risk ratios.  SPSS includes relative risk ratios in the output, under the column "Exp(B)".

Tests for the overall effect of ses and write are outputted by the nomreg command. Below we see that the effects are statistically significant.

You can also use predicted probabilities to help you understand the model. You can calculate predicted probabilities using the SPSS matrix command. Below we calculate the predicted probability of choosing each program type at each level of ses, holding write at its means.

Matrix.
* intercept1 intercept2 pared public gpa.
* these coefficients are taken from the output.
compute b_gen = {1.689354 ; -0.057928 ; 1.162832 ; 0.629541}.
compute b_voc = {4.235530 ; -0.113603 ; 0.982670 ; 1.274063}.
* overall design matrix including means of public and gpa.
compute x = {{1 ; 1; 1}, make(3, 1, 52.775), {1, 0; 0, 1; 0, 0}}.
compute lp_gen = exp(x * b_gen).
compute lp_voc = exp(x * b_voc).
compute lp_aca = {1; 1; 1}.
compute p_gen = lp_gen/(lp_aca + lp_gen + lp_voc).
compute p_voc = lp_voc/(lp_aca + lp_gen + lp_voc).
compute p_aca = lp_aca/(lp_aca + lp_gen + lp_voc).
compute p = {p_gen, p_aca, p_voc}.
print p /title 'Predicted Probabilities for Outcomes 1 2 3 for ses 1 2 3 at mean of write'.
End Matrix.

Run MATRIX procedure:

Predicted Probabilities for Outcomes 1 2 3 for ses 1 2 3 at mean of write
   .3581989665   .4396824687   .2021185647
   .2283388262   .4777491509   .2939120229
   .1784967500   .7009009604   .1206022896

------ END MATRIX -----
Column 1 contains the predicted probabilities for prog = general, where ses equals 1, 2 and 3 on each successive row. Columns 2 and 3 are the same for prog = academic and prog = vocational, respectively. We can also calculate predicted probabilities as we vary write from 30 to 70, when ses = 1.

Matrix.
* intercept1 intercept2 pared public gpa.
* these coefficients are taken from the output.
compute b_gen = {1.689354 ; -0.057928 ; 1.162832 ; 0.629541}.
compute b_voc = {4.235530 ; -0.113603 ; 0.982670 ; 1.274063}.
* overall design matrix including means of public and gpa.
compute x = {make(5,1,1), {30; 40; 50; 60; 70}, make(5,1,1), make(5,1,0)}.
compute lp_gen = exp(x * b_gen).
compute lp_voc = exp(x * b_voc).
compute lp_aca = {1; 1; 1; 1; 1}.
compute p_gen = lp_gen/(lp_aca + lp_gen + lp_voc).
compute p_voc = lp_voc/(lp_aca + lp_gen + lp_voc).
compute p_aca = lp_aca/(lp_aca + lp_gen + lp_voc).
compute p = {p_gen, p_aca, p_voc}.
print p /title 'Predicted Probabilities for Outcomes 1 2 3 for write 30 40 50 60 70 at ses=1'.
End Matrix.

Run MATRIX procedure:

Predicted Probabilities for Outcomes 1 2 3 for write 30 40 50 60 70 at ses=1
   .2999966732   .0984378501   .6015654767
   .3656613530   .2141424912   .4201961559
   .3698577661   .3865775582   .2435646757
   .3083735022   .5752505689   .1163759289
   .2199925775   .7324300249   .0475773976

------ END MATRIX -----

Column 1 contains the predicted probabilities for prog = general, where write equals 30, 40, 50, 60 and 70 for rows 1 through 5, respectively. Columns 2 and 3 are the same for prog = academic and prog = vocational, respectively.

Things to consider

See also

References

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.