[http://www.ats.ucla.edu/stat/_headers/header1.htm][http://www.ats.ucla.edu/stat/sas/output/header.htm][http://www.ats.ucla.edu/stat/_headers/header2.htm]

SAS Annotated Output
Multinomial Logistic Regression

This page shows an example of a multinomial logistic regression analysis with footnotes explaining the output. The data were collected on 200 high school students and are scores on various tests, including a video game and a puzzle. The outcome measure in this analysis is the preferred flavor of ice cream - vanilla, chocolate or strawberry- from which we are going to see what relationships exists with video game scores (video), puzzle scores (puzzle) and gender (female). Our response variable, ice_cream, is going to be treated as categorical under the assumption that the levels of ice_cream have no natural ordering, and we are going to allow SAS to choose the referent group. In our example, this will be strawberry. By default, SAS sorts the outcome variable alphabetically or numerically and selects the last group to be the referent group. The variable ice_cream is a numeric variable in SAS, so we will add value labels using proc format.

data mlogit; 
  set in.mlogit; 
run;

proc format;
value ice_cream_l 
  1="chocolate"
  2="vanilla" 
  3="strawberry";
run;

Before running the multinomial logistic regression, obtaining a frequency of the ice cream flavors in the data can inform the selection of a reference group.

proc freq data = mlogit;
  format ice_cream ice_cream_l.;
  table ice_cream;
run;
The FREQ Procedure

                  favorite flavor of ice cream

                                       Cumulative    Cumulative
 ICE_CREAM    Frequency     Percent     Frequency      Percent
chocolate           47       23.50            47        23.50
vanilla             95       47.50           142        71.00
strawberry          58       29.00           200       100.00

Unlike ordered logistic regression in SAS, for which proc logistic is used, we will use proc catmod for the multinomial logistic regression. proc catmod is designed for categorical modeling and multinomial logistic regression is an example of such a model. The options we use within proc catmod will specify that our model is a multinomial logistic regression. On the direct statement, we list the continuous predictor variables. On the response statement, we specify that the response functions are generalized logits. This is what makes our model a logistic model. Finally, on the model statement, we indicate our outcome variable ice_cream and the predictor variables to be included in the model.

proc catmod data = mlogit;
  direct video puzzle female;
  response logits;
  model ice_cream = video puzzle female;
run;
                    Data Summary

Response                     ICE_CREAM     Response Levels    3
Weight Variable    None          Populations      143
Data Set           D1            Total Frequency  200
Frequency Missing  0             Observations     200
               Population Profiles

Sample    VIDEO    PUZZLE    FEMALE    Sample Size
    1     26       42        0                   1
    2     29       26        1                   1
    3     31       39        1                   1
    4     31       56        0                   1
    5     33       32        0                   1
    6     33       41        1                   1
    7     34       31        0                   1
    8     34       41        1                   1
    9     34       46        0                   1
   10     34       46        1                   1
   11     34       51        1                   1
   12     35       51        1                   1
   13     36       36        0                   2
   14     36       46        1                   1
   15     36       61        0                   1
   16     39       31        0                   1
   17     39       36        0                   1
   18     39       41        1                   1
   19     39       46        0                   3
   20     39       51        0                   2
   21     39       51        1                   4
   22     39       56        1                   1
   23     40       31        1                   1
   24     40       41        1                   1
   25     42       26        0                   1
   26     42       36        1                   1
   27     42       41        1                   4
   28     42       46        1                   1
   29     42       51        0                   1
   30     42       56        0                   2
   31     42       56        1                   2
   32     44       36        0                   1
   33     44       41        0                   1
   34     44       41        1                   1
   35     44       48        1                   1
   36     44       51        0                   1
   37     44       51        1                   2
   38     44       56        1                   2
   39     44       61        1                   1
   40     44       66        1                   1

Only the first 40 populations are displayed.

  Response Profiles

Response    ICE_CREAM
    1       1
    2       2
    3       3

                   Maximum Likelihood Analysis

           Maximum likelihood computations converged.

     Maximum Likelihood Analysis of Variance

Source               DF   Chi-Square    Pr > ChiSq

Intercept             2        17.89        0.0001
VIDEO                 2         3.43        0.1799
PUZZLE                2        11.82        0.0027
FEMALE                2         4.84        0.0891

Likelihood Ratio    278       287.61        0.3331

             Analysis of Maximum Likelihood Estimates

          Function               Standard        Chi-
Parameter  Number     Estimate      Error      Square    Pr > ChiSq
Intercept    1          5.9696     1.4375       17.24        <.0001
             2          4.0573     1.2229       11.01        0.0009
VIDEO        1         -0.0465     0.0251        3.43        0.0640
             2         -0.0229     0.0209        1.21        0.2721
PUZZLE       1         -0.0819     0.0238       11.82        0.0006
             2         -0.0430     0.0199        4.67        0.0306
FEMALE       1          0.8495     0.4482        3.59        0.0581
             2          0.0329     0.3500        0.01        0.9252

Data Summary

                    Data Summary

Responsea          ICE_CREAM     Response Levelsb   3
Weight Variable    None          Populationsc     143
Data Set           D1            Total Frequencyd 200
Frequency Missing  0             Observationse       200

a. Response - This is the response variable in the model. For this example, the response variable is ice_cream.

b. Response Levels - This indicates how many levels exist within the response variable. It also indicates how many models are fitted in the multinomial regression. In our dataset, there are three possible values for ice_cream (chocolate, vanilla and strawberry), so there are three levels to our response variable. In a multinomial regression, one level of the response variable is treated as the referent group, and then a model is fit for each of the remaining levels compared to the referent group. Since we have three levels, one will be the referent level (strawberry) and we will fit two models: 1) chocolate relative to strawberry and 2) vanilla relative to strawberry.

c. Populations - This indicates how many unique combinations of predictors appear in the dataset. This dataset of 200 records contained 143 populations, or 143 unique combinations of video, puzzle and female.

d. Total Frequency - This is the sum of the weights given to all valid observations. In this example, we are not assigning a weighting variable, so each observation in our dataset is equally weighted, and each weighting is, by default, 1. Since our dataset does not contain any missing values, the total frequency is simply the sum of the weight = 1 for all of our 200 observations.

e. Observations - This is the number of observations in the dataset with valid data in all of the variables needed for the specified model. In this example, our dataset does not contain any missing values, so the number of observations in our model is equal to the number of observations in our dataset.


Population Profiles

               Population Profiles

Samplef   VIDEO    PUZZLE    FEMALE    Sample Sizeg
    1     26       42        0                   1
    2     29       26        1                   1
    3     31       39        1                   1
    4     31       56        0                   1
    5     33       32        0                   1
    6     33       41        1                   1
    7     34       31        0                   1
    8     34       41        1                   1
    9     34       46        0                   1
   10     34       46        1                   1
   11     34       51        1                   1
   12     35       51        1                   1
   13     36       36        0                   2
   14     36       46        1                   1
   15     36       61        0                   1
   16     39       31        0                   1
   17     39       36        0                   1
   18     39       41        1                   1
   19     39       46        0                   3
   20     39       51        0                   2
   21     39       51        1                   4
   22     39       56        1                   1
   23     40       31        1                   1
   24     40       41        1                   1
   25     42       26        0                   1
   26     42       36        1                   1
   27     42       41        1                   4
   28     42       46        1                   1
   29     42       51        0                   1
   30     42       56        0                   2
   31     42       56        1                   2
   32     44       36        0                   1
   33     44       41        0                   1
   34     44       41        1                   1
   35     44       48        1                   1
   36     44       51        0                   1
   37     44       51        1                   2
   38     44       56        1                   2
   39     44       61        1                   1
   40     44       66        1                   1

Only the first 40 populations are displayed.
f. Sample - A sample is defined as a unique combination of predictors that appears in the dataset. The total number of samples is equal to the Populations (see superscript c).

g. Sample Size - The size of a sample is the number of records in the dataset with the given combination of predictor values. For example, if we look at sample 20, we see that there are two records in the dataset where video = 39, puzzle = 51 and female = 0.


Response Profiles

  Response Profilesh

Response    ICE_CREAM
    1       1
    2       2
    3       3
h. Response Profiles - This outlines the order in which the values of our outcome variable ice_cream are considered. By default in SAS, the last value is the referent group in the multinomial logistic regression model. In this case, the last value corresponds to ice_cream = 3, which is strawberry. Additionally, the numbers assigned to the other values of the outcome variable are useful in interpreting other portions of the multinomial regression output.

Maximum Likelihood Analysis

                   Maximum Likelihood Analysis

           Maximum likelihood computations converged.

     Maximum Likelihood Analysis of Variance

Sourcei                             DFj     Chi-Squarek    Pr > ChiSql
	
Intercept             2        17.89        0.0001
VIDEO                 2         3.43        0.1799
PUZZLE                2        11.82        0.0027
FEMALE                2         4.84        0.0891

Likelihood Ratiom    278       287.61        0.3331

i. Source - This refers to the model predictor variables and the intercept as sources of variation in the outcome variable.

j. DF - The degrees of freedom for this analysis refers to the two fitted models, so DF=2 for all of the variables.

k. Chi-Square - This is the post-estimation test statistic of the parameter across both models.

l. Pr > ChiSq - This is the p-value associated with the Chi-Square statistic. Here, the null hypothesis is that there is no relationship between the predictor variable and the outcome, ice_cream (i.e., the estimates of the predictor in both of the fitted models is zero). If the p-value is less than the specified alpha (usually .05 or .01), then this null hypothesis can be rejected.

m. Likelihood Ratio - Here, the Chi-Square statistic is calculated on all of the predictors at once. Because this example model includes continuous predictors (video and puzzle), this test is not appropriate and can be disregarded. However, if all of the predictors in a model are categorical, then this can be interpreted as a goodness-of-fit for the model as a whole and the p-value associated with the Chi-Square statistics would indicate whether or not all of the estimates in both models are zero.


Analysis of Maximum Likelihood Estimates

             Analysis of Maximum Likelihood Estimates

          Function                Standard     Chi-
Parameter  Numbern    Estimateo   Errorp       Squareq   Pr > ChiSqr

Intercept    1          5.9696     1.4375       17.24        <.0001
             2          4.0573     1.2229       11.01        0.0009
VIDEO        1         -0.0465     0.0251        3.43        0.0640
             2         -0.0229     0.0209        1.21        0.2721
PUZZLE       1         -0.0819     0.0238       11.82        0.0006
             2         -0.0430     0.0199        4.67        0.0306
FEMALE       1          0.8495     0.4482        3.59        0.0581
             2          0.0329     0.3500        0.01        0.9252

n. Function Number - Two models were defined in this multinomial regression: one relating chocolate to the referent category, strawberry, and another model relating vanilla to strawberry. The model number indicates to which model an estimate, standard error, chi-square, and p-value refer. We can refer to the response profiles to determine which response corresponds to which model. Our ice_cream category 1 is chocolate, so model 1 corresponds to the chocolate relative to strawberry model.

o. Estimate - These are the estimated multinomial logistic regression coefficients for the models. An important feature of the multinomial logit model is that it estimates k-1 models, where k is the number of levels of the outcome variable. SAS treats strawberry as the referent group and estimates a model for chocolate relative to strawberry and a model for vanilla relative to strawberry. Therefore, each estimate listed in this column must be considered in terms both the parameter it corresponds to and the model to which it belongs. The standard interpretation of the multinomial logit is that for a unit change in the predictor variable, the logit of outcome m relative to the referent group is expected to change by its respective parameter estimate (which is in log-odds units) given the other variables in the model are held constant.

Model Number 1: chocolate relative to strawberry

Intercept - This is the multinomial logit estimate for chocolate relative to strawberry when the predictor variables in the model are evaluated at zero. For males (the variable female evaluated at zero) with zero video and puzzle scores, the logit for preferring chocolate to strawberry is 5.9696. Note that evaluating video and puzzle at zero is out of the range of plausible scores. If the scores were mean-centered, the intercept would have a natural interpretation: log odds of preferring chocolate to strawberry for a male with average video and puzzle scores.

video - This is the multinomial logit estimate for a one unit increase in video score for chocolate relative to strawberry, given the other variables in the model are held constant. If a subject were to increase his video score by one point, the multinomial log-odds for preferring chocolate to strawberry would be expected to decrease by 0.0465 unit while holding all other variables in the model constant.

puzzle - This is the multinomial logit estimate for a one unit increase in puzzle score for chocolate relative to strawberry, given the other variables in the model are held constant. If a subject were to increase his puzzle score by one point, the multinomial log-odds for preferring chocolate to strawberry would be expected to decrease by 0.0819 unit while holding all other variables in the model constant.

female - This is the multinomial logit estimate comparing females to males for chocolate relative to strawberry, given the other variables in the model are held constant. The multinomial logit for females relative to males is 0.8495 unit higher for preferring chocolate to strawberry, given all other predictor variables in the model are held constant. In other words, females are more likely than males to prefer chocolate to strawberry.

Model 2: vanilla relative to strawberry

Intercept - This is the multinomial logit estimate for vanilla relative to strawberry when the other predictor variables in the model are evaluated at zero. For males (the variable female evaluated at zero) with zero video and puzzle scores, the logit for preferring vanilla to strawberry is 4.0573.

video - This is the multinomial logit estimate for a one unit increase in video score for vanilla relative to strawberry, given the other variables in the model are held constant. If a subject were to increase his video score by one point, the multinomial log-odds for preferring vanilla to strawberry would be expected to decrease by 0.0229 unit while holding all other variables in the model constant.

puzzle - This is the multinomial logit estimate for a one unit increase in puzzle score for vanilla relative to strawberry, given the other variables in the model are held constant. If a subject were to increase his puzzle score by one point, the multinomial log-odds for preferring vanilla to strawberry would be expected to decrease by 0.0430 unit while holding all other variables in the model constant.

female - This is the multinomial logit estimate comparing females to males for vanilla relative to strawberry, given the other variables in the model are held constant. The multinomial logit for females relative to males is 0.0329 unit higher for preferring vanilla to strawberry, given all other predictor variables in the model are held constant. In other words, males are less likely than females to prefer vanilla ice cream to strawberry ice cream.

p. Standard Error - These are the standard errors of the individual regression coefficients for the two respective models estimated.

q. Chi-Square - This column lists the Chi-Square test statistic of the given parameter and model.

r. Pr > Chi-Square - This is the p-value used to determine whether or not the null hypothesis that a particular predictor's regression coefficient is zero, given that the rest of the predictors are in the model, can be rejected. If the p-value less than alpha, then the null hypothesis can be rejected and the parameter estimate is considered to be statistically significant at that alpha level. The Chi-Square test statistic values follows a Chi-Square distribution which is used to test against the alternative hypothesis that the estimate is not equal to zero. In multinomial logistic regression, the interpretation of a parameter estimate's significance is limited to the model in which the parameter estimate was calculated. For example, the significance of a parameter estimate in the chocolate relative to strawberry model cannot be assumed to hold in the vanilla relative to strawberry model.

Model 1: chocolate relative to strawberry

For chocolate relative to strawberry, the Chi-Square test statistic for the intercept is 17.24 with an associated p-value of 0.0001. With an alpha level of 0.05, we would reject the null hypothesis and conclude that the multinomial logit for males (the variable female evaluated at zero) and with zero video and puzzle scores in chocolate relative to strawberry are found to be statistically different from zero.

For chocolate relative to strawberry, the Chi-Square test statistic for the predictor video is 3.43 with an associated p-value of 0.0640. If we set our alpha level to 0.05, we would fail to reject the null hypothesis and conclude that for chocolate relative to strawberry, the regression coefficient for video has not been found to be statistically different from zero given puzzle and female are in the model.
For chocolate relative to strawberry, the Chi-Square test statistic for the predictor puzzle is 11.82 with an associated p-value of 0.0006. If we again set our alpha level to 0.05, we would reject the null hypothesis and conclude that the regression coefficient for puzzle has been found to be statistically different from zero for chocolate relative to strawberry given that video and female are in the model.
For chocolate relative to strawberry, the Chi-Square test statistic for the predictor female is 3.59 with an associated p-value of 0.0581. If we again set our alpha level to 0.05, we would fail to reject the null hypothesis and conclude that the difference between males and females has not been found to be statistically different for chocolate relative to strawberry given that video and female are in the model.

Model 2: vanilla relative to strawberry

For vanilla relative to strawberry, the Chi-Square test statistic for the intercept is 11.01 with an associated p-value of 0.0009. With an alpha level of 0.05, we would reject the null hypothesis and conclude that a) the multinomial logit for males (the variable female evaluated at zero) and with zero video and puzzle scores in vanilla relative to strawberry are statistically different from zero; or b) for males with zero video and puzzle scores, there is a statistically significant difference between the likelihood of being classified as preferring vanilla or preferring strawberry. Such a male would be more likely to be classified as preferring vanilla to strawberry. We can make the second interpretation when we view the intercept as a specific covariate profile (males with zero video and puzzle scores). Based on the direction and significance of the coefficient, the intercept indicates whether the profile would have a greater propensity to be classified in one level of the outcome variable than the other level.

For vanilla relative to strawberry, the Chi-Square test statistic for the predictor video is 1.21 with an associated p-value of 0.2721. If we set our alpha level to 0.05, we would fail to reject the null hypothesis and conclude that for vanilla relative to strawberry, the regression coefficient for video has not been found to be statistically different from zero given puzzle and female are in the model.
For vanilla relative to strawberry, the Chi-Square test statistic for the predictor puzzle is 4.67 with an associated p-value of 0.0306. If we again set our alpha level to 0.05, we would reject the null hypothesis and conclude that the regression coefficient for puzzle has been found to be statistically different from zero for vanilla relative to strawberry given that video and female are in the model.
For vanilla relative to strawberry, the Chi-Square test statistic for the predictor female is 0.01 with an associated p-value of 0.9252. If we again set our alpha level to 0.05, we would fail to reject the null hypothesis and conclude that for vanilla relative to strawberry, the regression coefficient for female has not been found to be statistically different from zero given puzzle and video are in the model.

[http://www.ats.ucla.edu/stat/footer.htm]