|
|
|
||||
|
|
|||||
Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have difference preference than young ones. The outcome variable here will be the types of food, and the predictor variables might be the length of the alligators and other environmental variables.
Example 3. Several brands of similar products are on the market, and you want to study brand choices based on gender and age. For example, a recent finding of a market research group claims that among digital camera choices, women prefer Kodak more than men and men prefer Canon more than women.
For our data analysis example, we will expand our third example with a hypothetical data set. The data set contains information on 735 subjects who were asked their preference on three brands of some product (e.g., car or TV). Included in the data set are the information on subjects' gender and age. You can get access to the data set in Stata by typing:
use http://www.ats.ucla.edu/stat/stata/dae/mlogit, cleardescribe Contains data from mlogit.dta obs: 735 brand choices vars: 3 19 Jan 2006 09:43 size: 5,145 (99.4% of memory free) (_dta has notes) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- brand byte %9.0g female byte %8.0g age byte %9.0g ------------------------------------------------------------------------------- Sorted by: age
The outcome variable is brand. The variable female is coded as 0 for male and 1 for female. Let's start with some descriptive statistics of the variables of our interest.
tab brandbrand | Freq. Percent Cum. ------------+----------------------------------- 1 | 207 28.16 28.16 2 | 307 41.77 69.93 3 | 221 30.07 100.00 ------------+----------------------------------- Total | 735 100.00bysort brand: sum age female------------------------------------------------------------------------- -> brand = 1Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 207 31.48792 2.108374 24 38 female | 207 .5555556 .4981086 0 1------------------------------------------------------------------------- -> brand = 2Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 307 32.84365 1.824395 28 38 female | 307 .6775244 .468187 0 1------------------------------------------------------------------------- -> brand = 3Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 221 34.30317 2.347811 27 38 female | 221 .6470588 .4789695 0 1
mlogit brand female age, basecat(1)Iteration 0: log likelihood = -795.89581 Iteration 1: log likelihood = -709.10396 Iteration 2: log likelihood = -703.08391 Iteration 3: log likelihood = -702.97081 Iteration 4: log likelihood = -702.9707Multinomial logistic regression Number of obs = 735 LR chi2(4) = 185.85 Prob > chi2 = 0.0000 Log likelihood = -702.9707 Pseudo R2 = 0.1168------------------------------------------------------------------------------ brand | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 2 | female | .5238143 .1942466 2.70 0.007 .143098 .9045307 age | .3682065 .0550031 6.69 0.000 .2604024 .4760106 _cons | -11.77466 1.77461 -6.64 0.000 -15.25283 -8.296483 -------------+---------------------------------------------------------------- 3 | female | .4659414 .2260895 2.06 0.039 .022814 .9090688 age | .6859082 .0626265 10.95 0.000 .5631626 .8086539 _cons | -22.7214 2.058027 -11.04 0.000 -26.75505 -18.68774 ------------------------------------------------------------------------------ (brand==1 is the base outcome)
The output above has two parts, labeled with the categories of the outcome variable brand. They correspond to two equations:
log(P(brand=2)/P(brand=1)) = b_10 + b_11*female + b_12*age
log(P(brand=3)/P(brand=1)) = b_20 + b_21*female + b_22*age,
with b's being the raw regression coefficients from the output.
For example, we can say that for one unit change in the variable age, the log of the ratio of the two probabilities, P(brand=2)/P(brand=1), will be increased by 0.368, and the log of the ratio of the two probabilities P(brand=3)/P(brand=1) will be increased by 0.686. Therefore, we can say that, in general, the older a person is, the more he/she will prefer brand 2 or 3.
The ratio of the probability of choosing one outcome category over the probability of choosing the reference category is often referred as relative risk (and it is also sometimes referred as odds). So another way of interpreting the regression results is in terms of relative risk. We can say that for one unit change in the variable age, we expect the relative risk of choosing brand 2 over 1 to increase by exp(.3682) = 1.45. So we can say that the relative risk is higher for older people. For a dichotomous predictor variable such as female, we can say that the ratio of the relative risks of choosing brand 2 over 1 for female and male is exp(.5238). We can use the rrr option for mlogit command to display the regression results in the language of risk.
mlogit, rrrMultinomial logistic regression Number of obs = 735 LR chi2(4) = 185.85 Prob > chi2 = 0.0000 Log likelihood = -702.9707 Pseudo R2 = 0.1168------------------------------------------------------------------------------ brand | RRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 2 | female | 1.688456 .3279768 2.70 0.007 1.153843 2.470772 age | 1.44514 .0794872 6.69 0.000 1.297452 1.60964 -------------+---------------------------------------------------------------- 3 | female | 1.593514 .3602768 2.06 0.039 1.023076 2.48201 age | 1.985574 .1243495 10.95 0.000 1.756218 2.244884 ------------------------------------------------------------------------------ (brand==1 is the base outcome)
We can also present the regression results in terms of probabilities. For example, we can fix age at its mean and calculate the probabilities of choosing each brand for both females and males. There is actually a suite of Stata commands called spost, written by J. Scott Long and Jeremy Freese, for the post-estimation interpretation of regression models including multinomial regression models. It can downloaded by following the link presented by the Stata command findit spost9_ado (see How can I use the findit command to search for programs and get additional help? for more information about using findit). We will use the prtab command for getting the predicted probabilities.
prtab femalemlogit: Predicted probabilities for brandPredicted probability of outcome 2---------------------- female | Prediction ----------+----------- 0 | 0.4306 1 | 0.5006 ----------------------Predicted probability of outcome 3---------------------- female | Prediction ----------+----------- 0 | 0.2627 1 | 0.2883 ----------------------Predicted probability of outcome 1---------------------- female | Prediction ----------+----------- 0 | 0.3066 1 | 0.2111 ----------------------female age x= .63401361 32.90068
We can also present the regression result graphically. For example, we can create three variables p1, p2 and p3 for the predicted probabilities and plot them against a predictor variable. In the example below, we plot p1 against age separated by the variable female.
predict p1 p2 p3 sort age line p1 age if female ==0 || line p1 age if female==1, legend(order(1 "male" 2 "female"))
We will use the estout command to create a table of the results that might be more appropriate for publication. This command is user-written, so type findit estout to download it (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
estout, eform drop(_cons) cells(b(star fmt(%8.4f)) se(par)) stats(r2_p chi2 p, labels("Pseudo R-Square")) /// unstack varwidth(20) modelwidth(10) collabels(, none) mlabels("Multinomial Model")Multinomial Model 2 3 female 1.6885** 1.5935* (0.3280) (0.3603) age 1.4451*** 1.9856*** (0.0795) (0.1243) Pseudo R-Square 0.1168 chi2 185.8502 p 0.0000
Presenting the multinomial regression results can be somewhat tricky since there are multiple equations and multiple comparisons to present. For example, the table above only shows the relative risk ratio for 2 versus 1 and 3 versus 1. How about 2 versus 3? As the number of the outcome categories increases, the possible number of comparisons will go up as well, and in much greater speed. A very useful command is listcoef, written also by Long and Freese, and it will save us from further trouble. For example:
listcoef female, helpmlogit (N=735): Factor Change in the Odds of brandVariable: female (sd= .48)Odds comparing| Group 1 vs Group 2| b z P>|z| e^b e^bStdX ------------------+--------------------------------------------- 2 -3 | 0.05787 0.295 0.768 1.0596 1.0283 2 -1 | 0.52381 2.697 0.007 1.6885 1.2872 3 -2 | -0.05787 -0.295 0.768 0.9438 0.9725 3 -1 | 0.46594 2.061 0.039 1.5935 1.2518 1 -2 | -0.52381 -2.697 0.007 0.5923 0.7769 1 -3 | -0.46594 -2.061 0.039 0.6275 0.7988 ---------------------------------------------------------------- b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test e^b = exp(b) = factor change in odds for unit increase in X e^bStdX = exp(b*SD of X) = change in odds for SD increase in X
With the help option above, we get the explanation for each column of the output. We might actually present this table as part of the write-up as well. In more detailed fashion, we can say that holding all the other variables constant, the effect of female is 1.0596 on the relative risk of choosing brand 2 over 3, meaning that the percent increase of relative risk (or rather loosely, the odds) of choosing brand 2 over 3 from male to female is about 6 percent.
Sometimes, it might be desirable to present the results in terms of probabilities. But because that multinomial logistic model is not a linear model, it is not sufficient to present the regression results for just one set of values. For example, we can create a table of probabilities based on gender for a subject with an average age as shown below.
preserve sum age replace age=r(mean) drop p1 p2 p3predict p1 p2 p3 tabstat p1 p2 p3, by(female) restorefemale | p1 p2 p3 ---------+------------------------------ 0 | .3066382 .4306335 .2627283 1 | .2111244 .5006219 .2882537 ---------+------------------------------ Total | .2460812 .4750071 .2789118 ----------------------------------------
We can say that overall a person of average age will be more likely to choose brand 2 regardless of gender and there isn't a clearly preferred brand among average aged females. All seems good. But what if we change age from its mean to two units down?
preserve sum age replace age=r(mean)-2 drop p1 p2 p3predict p1 p2 p3 tabstat p1 p2 p3, by(female) restorefemale | p1 p2 p3 ---------+------------------------------ 0 | .5291631 .3558369 .115 1 | .4029471 .4575086 .1395443 ---------+------------------------------ Total | .4491404 .4202981 .1305614 ----------------------------------------
Now we can see that we will have to change our description! In general, the effect of the variable female depends on the values of other predictor variables in the model. In our case, it depends on the value of the variable age. For this reason, it is more informative to present the predicted probability plots (as shown before).
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services