Multinomial logistic regression is for modeling nominal outcome variables, in which the log odds of the outcomes are modeled as a linear combination of the predictor variables.
Please Note: The purpose of this page is to show how to use various data analysis commands. It does not cover all aspects of the research process which researchers are expected to do. In particular, it does not cover data cleaning and checking, verification of assumptions, model diagnostics and potential follow-up analyses.
Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have difference preference than young ones. The outcome variable here will be the types of food, and the predictor variables might be the length of the alligators and other environmental variables.
Example 3. Entering high school students make program choices among general program, vocational program and academic program. Their choice might be modeled using their writing score and their social economic status.
For our data analysis example, we will expand the third example using the hsbdemo data set. You can download the data here .
proc contents data = "G:\hsbdemo"; run;The CONTENTS Procedure Data Set Name C:\hsbdemo Observations 200 Member Type DATA Variables 13 Engine V9 Indexes 0 Created Wednesday, May 20, 2009 03:14:39 PM Observation Length 40 Last Modified Wednesday, May 20, 2009 03:14:39 PM Deleted Observations 0 Protection Compressed NO Data Set Type Sorted NO Label Written by SAS Data Representation WINDOWS_32 Encoding Default Engine/Host Dependent Information Data Set Page Size 4096 Number of Data Set Pages 3 First Data Page 1 Max Obs per Page 101 Obs in First Data Page 43 Number of Data Set Repairs 0 Filename C:\hsbdemo.sas7bdat Release Created 9.0000M0 Host Created WIN Alphabetic List of Variables and Attributes # Variable Type Len Label 12 AWARDS Num 3 13 CID Num 3 2 FEMALE Num 3 11 HONORS Num 3 honores eng 1 ID Num 4 8 MATH Num 3 math score 5 PROG Num 3 type of program 6 READ Num 3 reading score 4 SCHTYP Num 3 type of school 9 SCIENCE Num 3 science score 3 SES Num 3 10 SOCST Num 3 social studies score 7 WRITE Num 3 writing score
The data set contains variables on 200 students. The outcome variable is prog, program type. The predictor variables are social economic status, ses, a three-level categorical variable and writing score, write, a continuous variable. Let's start with getting some descriptive statistics of the variables of interest.
proc freq data = "G:\hsbdemo"; tables prog*ses / chisq norow nocol nofreq; run;; The FREQ Procedure Table of PROG by SES PROG(type of program) SES Percent | 1| 2| 3| Total -----------+--------+--------+--------+ general | 8.00 | 10.00 | 4.50 | 22.50 -----------+--------+--------+--------+ academic | 9.50 | 22.00 | 21.00 | 52.50 -----------+--------+--------+--------+ vocational | 6.00 | 15.50 | 3.50 | 25.00 -----------+--------+--------+--------+ Total 47 95 58 200 23.50 47.50 29.00 100.00 Statistics for Table of PROG by SES Statistic DF Value Prob ------------------------------------------------------ Chi-Square 4 16.6044 0.0023 Likelihood Ratio Chi-Square 4 16.7830 0.0021 Mantel-Haenszel Chi-Square 1 0.0598 0.8068 Phi Coefficient 0.2881 Contingency Coefficient 0.2769 Cramer's V 0.2037 Sample Size = 200 proc sort data = "G:\hsbdemo"; by prog; run; proc means data = "G:\hsbdemo"; var write; by prog; run;----------------------------------- type of program=general ------------------------------------ The MEANS Procedure Analysis Variable : WRITE writing score N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- 45 51.3333333 9.3977754 31.0000000 67.0000000 ------------------------------------------------------------------- ----------------------------------- type of program=academic ----------------------------------- Analysis Variable : WRITE writing score N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- 105 56.2571429 7.9433433 33.0000000 67.0000000 ------------------------------------------------------------------- ---------------------------------- type of program=vocational ---------------------------------- Analysis Variable : WRITE writing score N Mean Std Dev Minimum Maximum ------------------------------------------------------------------- 50 46.7600000 9.3187544 31.0000000 67.0000000 -------------------------------------------------------------------
proc logistic data = "G:\hsbdemo"; class prog (ref = "academic") ses (ref = "1") / param = ref; model prog = ses write / link = glogit; run;The LOGISTIC Procedure Model Information Data Set G:\hsbdemo Response Variable PROG type of program Number of Response Levels 3 Model generalized logit Optimization Technique Newton-Raphson Number of Observations Read 200 Number of Observations Used 200 Response Profile Ordered Total Value PROG Frequency 1 academic 105 2 general 45 3 vocational 50 Logits modeled use PROG='academic' as the reference category. Class Level Information Design Class Value Variables SES 1 0 0 2 1 0 3 0 1 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 412.193 375.963 SC 418.790 402.350 -2 Log L 408.193 359.963 The LOGISTIC Procedure Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 48.2299 6 <.0001 Score 45.1588 6 <.0001 Wald 37.2946 6 <.0001 Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq SES 4 10.8162 0.0287 WRITE 2 26.4633 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter PROG DF Estimate Error Chi-Square Pr > ChiSq Intercept general 1 2.8522 1.1664 5.9790 0.0145 Intercept vocational 1 5.2182 1.1635 20.1128 <.0001 SES 2 general 1 -0.5333 0.4437 1.4444 0.2294 SES 2 vocational 1 0.2914 0.4764 0.3742 0.5407 SES 3 general 1 -1.1628 0.5142 5.1137 0.0237 SES 3 vocational 1 -0.9827 0.5956 2.7224 0.0989 WRITE general 1 -0.0579 0.0214 7.3200 0.0068 WRITE vocational 1 -0.1136 0.0222 26.1392 <.0001 Odds Ratio Estimates Point 95% Wald Effect PROG Estimate Confidence Limits SES 2 vs 1 general 0.587 0.246 1.400 SES 2 vs 1 vocational 1.338 0.526 3.404 SES 3 vs 1 general 0.313 0.114 0.856 SES 3 vs 1 vocational 0.374 0.116 1.203 WRITE general 0.944 0.905 0.984 WRITE vocational 0.893 0.855 0.932
\[ln\left(\frac{P(prog=vocation)}{P(prog=academic)}\right) = b_{20} + b_{21}(ses=2) + b_{22}(ses=3) + b_{23}write\]
where \(b\)'s are the regression coefficients.
Using the test statement, we can also test specific hypotheses within or even across logits, such as if the effect of ses=3 in predicting general vs. academic equals the effect of ses = 3 in predicting vocational vs. academic. Usage of the test statement requires the unique names SAS assigns each parameter in the model. The option outest = in the proc logistic statement produces an output dataset with the parameter names and values. We can get these names by printing them, transposed to be more readable. The noobs option in the proc print statement suppresses observation numbers, since they are meaningless in the parameter dataset.
proc logistic data = "G:\hsbdemo" outest = mlogit_param; class prog (ref = "academic") ses (ref = "1") / param = ref; model prog = ses write / link = glogit; run;proc transpose data = mlogit_param; run; proc print noobs; run; _NAME_ _LABEL_ PROG Intercept_general Intercept: PROG=general 2.852 Intercept_vocational Intercept: PROG=vocational 5.218 SES2_general SES 2: PROG=general -0.533 SES2_vocational SES 2: PROG=vocational 0.291 SES3_general SES 3: PROG=general -1.163 SES3_vocational SES 3: PROG=vocational -0.983 WRITE_general writing score: PROG=general -0.058 WRITE_vocational writing score: PROG=vocational -0.114 _LNLIKE_ Model Log Likelihood -179.982
Here we see the same parameters as in the output above, but with their unique SAS-given names. We are interested in testing whether SES3_general is equal to SES3_vocational, which we can now do with the test statement. The code preceding the ":" in the test statement is a label identifying the test in the output, and it must conform to SAS variable-naming rules (i.e. 32 characters long or less,letters, numerals, and underscore).
proc logistic data = "G:\hsbdemo" outest = mlogit_param;
class prog (ref = "academic") ses (ref = "1") / param = ref;
model prog = ses write / link = glogit;
SES3_general_vs_SES3_vocational: test SES3_general - SES3_vocational;
run;
***SOME OUTPUT OMITTED***
Linear Hypotheses Testing Results
Wald
Label Chi-Square DF Pr > ChiSq
SES3_general_vs_SES3_vocational 0.0772 1 0.7811
The effect of ses=3 for predicting general vs. academic is not different from the effect of
ses=3 for predicting vocational vs. academic.
You can also use predicted probabilities to help you understand the model. You can calculate predicted probabilities using the lsmeans statement and the ilink option. For multinomial data, lsmeans requires glm rather than reference (dummy) coding, even though they are essentially the same, so be sure to respecify the coding in the class statement. However, glm coding only allows the last category to be the reference group (prog = vocational and ses = 3)and will ignore any other reference group specifications. Below we use lsmeans to calculate the predicted probability of choosing program type academic or general at each level of ses, holding write at its means.
proc logistic data = "G:\hsbdemo" outest = mlogit_param;
class prog ses / param = glm;
model prog = ses write / link = glogit;
lsmeans ses / e ilink cl;
run;
***SOME OUTPUT OMITTED***
Coefficients for SES Least Squares Means
type of
Parameter program SES Row1 Row2 Row3 Row4 Row5 Row6
Intercept academic 1 1 1
Intercept general 1 1 1
SES 1 academic 1 1
SES 1 general 1 1
SES 2 academic 2 1
SES 2 general 2 1
SES 3 academic 3 1
SES 3 general 3 1
writing score academic 52.775 52.775 52.775
writing score general 52.775 52.775 52.775
***SOME OUTPUT OMITTED***
SES Least Squares Means
Standard
type of Error of Lower Upper
program SES Mean Mean Mean Mean
academic 1 0.4397 0.07799 0.2868 0.5925
academic 2 0.4777 0.05526 0.3694 0.5861
academic 3 0.7009 0.06630 0.5709 0.8309
general 1 0.3582 0.07264 0.2158 0.5006
general 2 0.2283 0.04512 0.1399 0.3168
general 3 0.1785 0.05405 0.07256 0.2844
The predicted probabilities are in the "Mean" column. Thus, for ses
= 3 and write = 52.775, we see that the probability of being the academic
program is 0.7009 and for the general program 0.1785.
To obtain predicted probalities for the program type vocational, we can reverse the ordering of the categories
using the descending option in the proc logistic statement.
This will make academic the reference group for prog and 3 the reference
group for ses.
proc logistic data = "G:\hsbdemo" outest = mlogit_param descending;
class prog ses / param = glm;
model prog = ses write / link = glogit;
lsmeans ses / e ilink cl;
run;
***SOME OUTPUT OMITTED***
Coefficients for SES Least Squares Means
type of
Parameter program SES Row1 Row2 Row3 Row4 Row5 Row6
Intercept vocational 1 1 1
Intercept general 1 1 1
SES 1 vocational 1 1
SES 1 general 1 1
SES 2 vocational 2 1
SES 2 general 2 1
SES 3 vocational 3 1
SES 3 general 3 1
writing score vocational 52.775 52.775 52.775
writing score general 52.775 52.775 52.775
***SOME OUTPUT OMITTED***
SES Least Squares Means
Standard
type of Error of Lower Upper
program SES Mean Mean Mean Mean
vocational 1 0.2021 0.05996 0.08459 0.3197
vocational 2 0.2939 0.05036 0.1952 0.3926
vocational 3 0.1206 0.04643 0.02960 0.2116
general 1 0.3582 0.07264 0.2158 0.5006
general 2 0.2283 0.04512 0.1399 0.3168
general 3 0.1785 0.05405 0.07256 0.2844
Here we see the probability of being in the vocational program when ses = 3 and
write = 52.775 is 0.1206, which is what we would have expected since (1 -
0.7009 - 0.1785) = 0.1206, where 0.7009 and 0.1785 are the probabilities of
being in the academic and general programs under the same conditions.
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.