This article was originally published in Perspective, Volume 18, Number 2, 1995, pp. 25-29.
by Linda R. Ferguson, Office of Academic Computing
Mel Widawski, UCLA Drug Abuse Research Center
Dawn M. Upchurch, UCLA School of Public Health
Editor's Note:
This article was originally presented at the Western Users of SAS Software Conference in Santa Monica, California in October 1993 and was recognized as the "Best Contributed Paper in the Statistics Section." OAC is pleased to reprint the article here since many of our readers will potentially benefit from the statistical capabilities described in this article.The DIFFPARM macro is written in version 6.08 of the SAS System for MVS. This macro is available for use at OAC by including the %DIFFPARM statement in your SAS job as documented in this article.
A SAS macro, DIFFPARM, is described that extends the capabilities of the CATMOD procedure for analyses of multinomial logistic regression models. The DIFFPARM macro uses the results of PROC CATMOD to calculate significance tests for differences in parameter estimates across categories of the outcome variable. These significance tests enable more elaborate conclusions to be drawn from the results of multinomial logistic regression models because they enable researchers to determine whether there is a significant difference in the effects of a given predictor variable on two different response categories of the outcome variable. Such comparisons are a unique feature of multinomial logistic regression models, as compared to a series of binary logistic regression models, and are not currently available within the SAS System.
Multinomial logistic regression, sometimes referred to as polytomous logistic regression, is used to analyze the relationship between an outcome (dependent) variable having two or more categories and a set of predictor (independent) variables. Whereas this type of regression model can handle both nominal and ordinal scaled outcome variables, this paper is limited to outcome variables having unordered categories. The advantages of this type of model include: more than two response categories can be considered simultaneously; one set of coefficients can be regressed on all response categories; and differences in the effects of the same coefficient across response categories can be assessed statistically.
Binary logistic regression analysis, by comparison, can handle only two response categories at a time. If the outcome variable has more than two categories, then either (1) categories are combined or dropped, and one model is estimated; or (2) a series of independent models are estimated for each binary outcome variable that is formed from the original multi-category outcome variable. In either situation, statistical tests cannot be made to assess the relative effects of a given predictor variable on different categories of the outcome variable.
Although multinomial logistic regression models can be estimated using PROC CATMOD, this procedure does not provide tests of the significance for the difference between two parameter estimates. A SAS macro, DIFFPARM, has been written to calculate standardized z-tests for DIFFerences in PARaMeter estimates across categories of the outcome variable. The significance tests produced by the DIFFPARM macro extend analyses of multinomial logistic regression models that can be performed using PROC CATMOD.
The following section describes the equation that is used to calculate the significance of the difference between two parameter estimates from the same regression model. The DIFFPARM macro computes these standardized z-tests based on the parameter estimates and their estimated variances and covariances from PROC CATMOD. An example is provided to demonstrate how these significance tests can be used to elaborate the substantive conclusions drawn from a multinomial logistic regression analysis.
The significance test for the difference between two parameter estimates from the same regression model is comprised of two components. The first component is obtained from the difference of the parameter estimates, bi - bj, where i j. The second component is obtained from the estimate of the variance for the difference, VAR(bi - bj). The variance for the difference between bi and bj is the sum of their variances minus twice their covariance:
VAR(bi - bj) = VAR(bi) + VAR(bj) - 2COV(bi ,bj)
where VAR(bi) and VAR(bj) are the variance estimates for bi and bj, respectively; and COV(bi,bj) is the estimated covariance of the parameter estimates. The estimated standard error for the difference between bi and bj is simply the square root of VAR(bi - bj):
SE(bi - bj) = ĂVAR(bi - bj)
To test for the significance of the difference between parameter estimates bi and bj a standardized z-test is computed as:
bi - bj
Z(bi ,bj) = _____________
SE(bi - bj)
The null hypothesis for this test is that the difference between parameter estimates is zero. That is, the effects of a given predictor variable on some category of the outcome variable are the same as its effects on a different category of the outcome variable.
In order to understand how this significance test is used with PROC CATMOD, it is useful to briefly describe how a multinomial logistic regression model is specified within this procedure. If an outcome variable has k response categories and the generalized logits function is specified on the RESPONSE statement, then each response category will be contrasted with the last (reference) category yielding k-1 contrast functions: one function contrasts the first response category to the last response category; the next function contrasts the second category to the last; and so on. For each response function there are p parameter estimates: one parameter estimate for each predictor variable plus the intercept.
The parameter estimates, variances and covariances needed to calculate the significance test described above are saved in the OUTEST data set by specifying that option on the RESPONSE statement of PROC CATMOD. The DIFFPARM macro uses the OUTEST data set as input to test, for a given predictor variable, the difference between the log-odds estimated in one response function and the log-odds estimated in a different response function. When there are k-1 response functions, a total of (k-1)!/(k-1-2)!2! comparisons can be made for a given predictor variable between two response functions at a time.
Suppose an outcome variable has five categories. Four response functions are estimated. The intercepts for these response functions are labeled in the OUTEST data set at B1, B2, B3 and B4, respectively. The parameter estimates for the first predictor variable are labeled as B5, B6, B7 and B8; the parameter estimates for the second predictor variable are labeled as B9, B10, B11 and B12; etc. The DIFFPARM macro computes significance tests among pairs of parameter estimates associated with the same predictor variable, e.g., B5 - B6; B5 - B7; B5 - B8; B6 -B7; B6 - B8; and B7 - B8.
The DIFFPARM macro writes the significance tests and their probability values to a table. The significance test is normally distributed and the probability value for a two-tailed test is obtained using the SAS PROBNORM function. Each row of the table corresponds to a pairwise comparison of parameter estimates that are obtained from different response functions. There are p (k-1)!/(k-1-2)!2! rows in the table.
The following setup is used to estimate a multinomial logistic regression model to examine how life-course transitions of female teenagers are associated with demographic characteristics, socio-economic status and early teenage experiences. Five mutually exclusive events are considered for the outcome variable, involving transitions among parenthood, marriage and full-time employment (e.g., unwed mothers, working mothers, etc.). The substantive issues in this example have been simplified in order to illustrate how the significance test described in this paper can be applied to research questions in a logistic regression framework.
PROC CATMOD;
RESPONSE LOGIT / OUTEST=LOGITEST;
DIRECT BLACK HISPANIC NOHS HSONLY
NUMSIBS YOUNGSEX;
MODEL GROUP = BLACK HISPANIC NOHS
HSONLY NUMSIBS YOUNGSEX
/ NOITER NOPROFILE;
%DIFFPARM(NUMCAT=5,NUMPARM=7);
RUN;
Since a generalized logits function is specified on the RESPONSE statement of PROC CATMOD, the fifth response category (not a teenage mother) becomes the reference category to which each of the other categories is compared in their respective response functions. Two parameters are specified when the DIFFPARM macro is called: numcat is the number of categories in the outcome variable (k); and numparm is the number of parameters estimated per response function (p).
The parameter estimates obtained from PROC CATMOD are shown in Table 1 and the significance tests obtained from the DIFFPARM macro are shown in Table 2.
For example, the predictor variable NOHS represents whether respondents completed high school or not. The reference category is respondents who have more than a high school degree. The estimated log-odds for NOHS is 1.24 for the second outcome (working mother) versus the fifth (not a teenage mother) and is 2.72 for the fourth outcome (unwed mother) compared to the fifth, labeled as parameters 14 and 16, respectively. This indicates that the odds are 3.5 times higher that a high school dropout, compared to a respondent with some post-secondary schooling, will be a working mother rather than postpone parenthood until after the teen years; and that the odds are 15.2 times higher that a dropout will be an unwed mother rather than postpone parenthood.
The z-test for parameters 14 and 16 in Table 2 shows that the log-odds for NOHS in the fourth response function are significantly larger than the log-odds for NOHS in the second response function. Compared to a respondent with post-secondary schooling, a high school dropout not only has a higher log-odds of being either a working mother or an unwed mother, relative to not being a teenage mother, but the log-odds of a dropout being an unwed mother versus a non-mother are significantly larger than the log-odds of a dropout being a working mother versus a non-mother.
The DIFFPARM macro presented in this paper extends the capabilities of PROC CATMOD for analyses of multinomial logistic regression models and capitalizes on a key advantage of this methodology as compared to a series of separate binary logistic regression models to analyze multinomial outcomes. The DIFFPARM macro uses the results of PROC CATMOD to test the significance of the difference between two parameter estimates obtained from different response functions. The significance tests enable researchers to compare the estimated log-odds of a given predictor variable on a category of the outcome variable relative to its estimated log-odds on a different category of the outcome variable. These comparisons potentially enrich the findings of a multinomial logistic regression analysis, since significant differences may be found for certain parameter estimates when compared across categories of the outcome variable. These results could not be obtained from independent binary logistic regression models or from PROC CATMOD without the DIFFPARM macro.

Return to OAC's Home Page.
Return to Computational Services' Home Page.
*OAC/CS 21 Jun 95; Rev. 19 Dec 95