|
|
|
||||
|
|
|||||
Example 2. A biologist may be interested in food choices that alligators make. Adult alligators might have difference preference than young ones. The outcome variable here will be the types of food, and the predictor variables might be the length of the alligators and other environmental variables.
Example 3. Several brands of similar products are on the market, and you want to study brand choices based on gender and age. For example, a recent finding of a market research group claims that among digital camera choices, women prefer Kodak more than men and men prefer Canon more than women.
For our data analysis example, we will expand our third example with a hypothetical data set. The data set contains information on 735 subjects who were asked their preference on three brands of some product (e.g., car or TV). Included in the data set are the information on subjects' gender and age. You can download the data here .
proc contents data = "D:\mlogit"; run;The CONTENTS Procedure Data Set Name D:\mlogit Observations 735 Member Type DATA Variables 3 Engine V9 Indexes 0 Created Wednesday, May 23, 2007 05:58:17 PM Observation Length 9 Last Modified Wednesday, May 23, 2007 05:58:17 PM Deleted Observations 0 Protection Compressed NO Data Set Type Sorted NO Label Written by SAS Data Representation WINDOWS_32 Encoding Default Engine/Host Dependent Information Data Set Page Size 4096 Number of Data Set Pages 2 First Data Page 1 Max Obs per Page 446 Obs in First Data Page 308 Number of Data Set Repairs 0 File Name D:\mlogit.sas7bdat Release Created 9.0000M0 Host Created WIN Alphabetic List of Variables and Attributes # Variable Type Len 3 AGE Num 3 1 BRAND Num 3 2 FEMALE Num 3
The outcome variable is brand. The variable female is coded as 0 for male and 1 for female. Let's start with some descriptive statistics of the variables of our interest.
proc freq data = "D:\mlogit"; tables brand; run;The FREQ Procedure Cumulative Cumulative BRAND Frequency Percent Frequency Percent ---------------------------------------------------------- 1 207 28.16 207 28.16 2 307 41.77 514 69.93 3 221 30.07 735 100.00proc sort data = "D:\mlogit"; by brand; run; proc means data = "D:\mlogit"; by brand; var age female; run;BRAND=1 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- AGE 207 31.4879227 2.1083742 24.0000000 38.0000000 FEMALE 207 0.5555556 0.4981086 0 1.0000000 ------------------------------------------------------------------------------- BRAND=2 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- AGE 307 32.8436482 1.8243945 28.0000000 38.0000000 FEMALE 307 0.6775244 0.4681870 0 1.0000000 ------------------------------------------------------------------------------- BRAND=3 Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------------- AGE 221 34.3031674 2.3478111 27.0000000 38.0000000 FEMALE 221 0.6470588 0.4789695 0 1.0000000 -------------------------------------------------------------------------------
proc logistic data = "D:\mlogit"; class brand (ref = "1"); model brand = female age / link = glogit; run;The LOGISTIC Procedure Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq FEMALE 2 7.6704 0.0216 AGE 2 123.3880 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter BRAND DF Estimate Error Chi-Square Pr > ChiSq Intercept 2 1 -11.7746 1.7746 44.0239 <.0001 Intercept 3 1 -22.7214 2.0580 121.8897 <.0001 FEMALE 2 1 0.5238 0.1942 7.2719 0.0070 FEMALE 3 1 0.4659 0.2261 4.2472 0.0393 AGE 2 1 0.3682 0.0550 44.8133 <.0001 AGE 3 1 0.6859 0.0626 119.9541 <.0001 Odds Ratio Estimates Point 95% Wald Effect BRAND Estimate Confidence Limits FEMALE 2 1.688 1.154 2.471 FEMALE 3 1.594 1.023 2.482 AGE 2 1.445 1.297 1.610 AGE 3 1.986 1.756 2.245
The output above has two parts, labeled with the categories of the outcome variable brand. They correspond to two equations:
log(P(brand=2)/P(brand=1)) = b_10 + b_11*female + b_12*age
log(P(brand=3)/P(brand=1)) = b_20 + b_21*female + b_22*age,
with b's being the raw regression coefficients from the output.
For example, we can say that for one unit change in the variable age, the log of the ratio of the two probabilities, P(brand=2)/P(brand=1), will be increased by 0.368, and the log of the ratio of the two probabilities P(brand=3)/P(brand=1) will be increased by 0.686. Therefore, we can say that, in general, the older a person is, the more he/she will prefer brand 2 or 3.
The ratio of the probability of choosing one outcome category over the probability of choosing the reference category is often referred as relative risk (and it is also sometimes referred as odds). So another way of interpreting the regression results is in terms of relative risk. We can say that for one unit change in the variable age, we expect the relative risk of choosing brand 2 over 1 to increase by exp(.3682) = 1.45. So we can say that the relative risk is higher for older people. For a dichotomous predictor variable such as female, we can say that the ratio of the relative risks of choosing brand 2 over 1 for female and male is exp(.5238).
Below is one way of describing the results.
Both female and age are statistically significant across the two models. Females are more likely to prefer brands 2 or 3 compared to brand 1. Also, the older a person is, the more likely he/she is to prefer brands 2 or 3 to brand 1. Both of these findings are statistically significant.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services