UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Data Analysis Examples
Multinomial Logistic Regression

Examples

Example 1. People's occupational choices might be influenced by their parents' occupations and their own education level. We can study the relationship of one's occupation choice with education level and father's occupation.  The occupational choices will be the outcome variable which consists of categories of occupations.

Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have difference preference than young ones. The outcome variable here will be the types of food, and the predictor variables might be the length of the alligators and other environmental variables.

Example 3. Several brands of similar products are on the market, and you want to study brand choices based on gender and age. For example, a recent finding of a market research group claims that among digital camera choices, women prefer Kodak more than men and men prefer Canon more than women.

Description of the Data

For our data analysis example, we will expand our third example with a hypothetical data set. The data set contains information on 735 subjects who were asked their preference on three brands of some product (e.g., car or TV).  Included in the data set are the information on subjects' gender and age. You can download the data here .

proc contents data = "D:\mlogit";
run;
The CONTENTS Procedure

Data Set Name        D:\mlogit                              Observations          735
Member Type          DATA                                   Variables             3
Engine               V9                                     Indexes               0
Created              Wednesday, May 23, 2007 05:58:17 PM    Observation Length    9
Last Modified        Wednesday, May 23, 2007 05:58:17 PM    Deleted Observations  0
Protection                                                  Compressed            NO
Data Set Type                                               Sorted                NO
Label                Written by SAS

Data Representation  WINDOWS_32
Encoding             Default

             Engine/Host Dependent Information

Data Set Page Size          4096
Number of Data Set Pages    2
First Data Page             1
Max Obs per Page            446
Obs in First Data Page      308
Number of Data Set Repairs  0
File Name                   D:\mlogit.sas7bdat
Release Created             9.0000M0
Host Created                WIN

Alphabetic List of Variables and Attributes

#    Variable    Type    Len

3    AGE         Num       3
1    BRAND       Num       3
2    FEMALE      Num       3

The outcome variable is brand. The variable female is coded as 0 for male and 1 for female. Let's start with some descriptive statistics of the variables of our interest.

proc freq data = "D:\mlogit";
tables brand;
run;
The FREQ Procedure

                                  Cumulative    Cumulative
BRAND    Frequency     Percent     Frequency      Percent
----------------------------------------------------------
    1         207       28.16           207        28.16
    2         307       41.77           514        69.93
    3         221       30.07           735       100.00
proc sort data = "D:\mlogit";
by brand;
run;

proc means data = "D:\mlogit";
by brand;
var age female;
run;
BRAND=1

The MEANS Procedure

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
AGE         207      31.4879227       2.1083742      24.0000000      38.0000000
FEMALE      207       0.5555556       0.4981086               0       1.0000000
-------------------------------------------------------------------------------

BRAND=2

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
AGE         307      32.8436482       1.8243945      28.0000000      38.0000000
FEMALE      307       0.6775244       0.4681870               0       1.0000000
-------------------------------------------------------------------------------

BRAND=3

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
AGE         221      34.3031674       2.3478111      27.0000000      38.0000000
FEMALE      221       0.6470588       0.4789695               0       1.0000000
-------------------------------------------------------------------------------

Some Strategies You Might Try

Using the Multinomial Logit Model

Now we have warmed up to building our model. Our goal is to associate the brand choices with age and gender. We will assume a linear relationship between the transformed outcome variable and our predictor variables female and age. Since there are multiple categories, we will choose a base category as the comparison group. Here our choice is the first brand (brand=1).
proc logistic data = "D:\mlogit";
class brand (ref = "1");
model brand = female age / link = glogit;
run;
The LOGISTIC Procedure

        Type 3 Analysis of Effects

                        Wald
Effect      DF    Chi-Square    Pr > ChiSq

FEMALE       2        7.6704        0.0216
AGE          2      123.3880        <.0001

                  Analysis of Maximum Likelihood Estimates

                                        Standard          Wald
Parameter    BRAND    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept    2         1    -11.7746      1.7746       44.0239        <.0001
Intercept    3         1    -22.7214      2.0580      121.8897        <.0001
FEMALE       2         1      0.5238      0.1942        7.2719        0.0070
FEMALE       3         1      0.4659      0.2261        4.2472        0.0393
AGE          2         1      0.3682      0.0550       44.8133        <.0001
AGE          3         1      0.6859      0.0626      119.9541        <.0001

               Odds Ratio Estimates

                      Point          95% Wald
Effect    BRAND    Estimate      Confidence Limits

FEMALE    2           1.688       1.154       2.471
FEMALE    3           1.594       1.023       2.482
AGE       2           1.445       1.297       1.610
AGE       3           1.986       1.756       2.245

The output above has two parts, labeled with the categories of the outcome variable brand. They correspond to two equations:

log(P(brand=2)/P(brand=1)) = b_10 + b_11*female + b_12*age
log(P(brand=3)/P(brand=1)) = b_20 + b_21*female + b_22*age,

with b's being the raw regression coefficients from the output.

For example, we can say that for one unit change in the variable age, the log of the ratio of the two probabilities, P(brand=2)/P(brand=1), will be increased by 0.368, and the log of the ratio of the two probabilities P(brand=3)/P(brand=1) will be increased by 0.686. Therefore, we can say that, in general, the older a person is, the more he/she will prefer brand 2 or 3.

The ratio of the probability of choosing one outcome category over the probability of choosing the reference category is often referred as relative risk (and it is also sometimes referred as odds).  So another way of interpreting the regression results is in terms of relative risk. We can say that for one unit change in the variable age, we expect the relative risk of choosing brand 2 over 1 to increase by exp(.3682) = 1.45. So we can say that the relative risk is higher for older people. For a dichotomous predictor variable such as female, we can say that the ratio of the relative risks of choosing brand 2 over 1 for female and male is exp(.5238).

Sample Write-up of the Analysis

Below is one way of describing the results.

Both female and age are statistically significant across the two models.  Females are more likely to prefer brands 2 or 3 compared to brand 1.  Also, the older a person is, the more likely he/she is to prefer brands 2 or 3 to brand 1.  Both of these findings are statistically significant.

Cautions, Flies in the Ointment

Additional Examples

See Also


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California