Stata FAQ
How can I compare regression coefficients between 2 groups?
Sometimes your research may predict that the size of a
regression coefficient should be bigger for one group than for another. For example, you
might believe that the regression coefficient of height predicting weight
would be higher for men than for women. Below, we have a data file with 10 fictional
females and 10 fictional males, along with their height in inches and
their weight in pounds.
id gender height weight
1 F 56 117
2 F 60 125
3 F 64 133
4 F 68 141
5 F 72 149
6 F 54 109
7 F 62 128
8 F 65 131
9 F 65 131
10 F 70 145
11 M 64 211
12 M 68 223
13 M 72 235
14 M 76 247
15 M 80 259
16 M 62 201
17 M 69 228
18 M 74 245
19 M 75 241
20 M 82 269
|
We analyzed their data separately using the regress command below after first sorting by gender.
use http://www.ats.ucla.edu/stat/stata/faq/compreg2, clear
sort gender
by gender: regress weight height
The parameter estimates (coefficients) for females and males are shown below,
and the results do seem to suggest that height is a stronger predictor
of weight for males (3.19) than for females (2.1).
-> gender=F
Source | SS df MS Number of obs = 10
---------+------------------------------ F( 1, 8) = 359.81
Model | 1319.56112 1 1319.56112 Prob > F = 0.0000
Residual | 29.3388815 8 3.66736019 R-squared = 0.9782
---------+------------------------------ Adj R-squared = 0.9755
Total | 1348.90 9 149.877778 Root MSE = 1.915
------------------------------------------------------------------------------
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
height | 2.095872 .110491 18.969 0.000 1.84108 2.350665
_cons | -2.39747 7.053272 -0.340 0.743 -18.66234 13.8674
------------------------------------------------------------------------------
-> gender=M
Source | SS df MS Number of obs = 10
---------+------------------------------ F( 1, 8) = 669.93
Model | 3882.53627 1 3882.53627 Prob > F = 0.0000
Residual | 46.3637317 8 5.79546646 R-squared = 0.9882
---------+------------------------------ Adj R-squared = 0.9867
Total | 3928.90 9 436.544444 Root MSE = 2.4074
------------------------------------------------------------------------------
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
height | 3.189727 .1232367 25.883 0.000 2.905543 3.473912
_cons | 5.601677 8.930197 0.627 0.548 -14.99139 26.19475
------------------------------------------------------------------------------
We can compare the regression coefficients of males with females to test the null
hypothesis Ho: Bf = Bm, where
Bf is the regression
coefficient for females, and Bm is the regression coefficient
for males. To do this analysis, we first make a dummy variable called female
that is coded 1 for female, and 0 for male and femht that is the product
of female and height. We then use female height
and femht as predictors in the regression equation.
generate female=.
replace female = 1 if gender == "F"
replace female = 0 if gender == "M"
generate femht = female*height
regress weight female height femht
The output is shown below
Source | SS df MS Number of obs = 20
---------+------------------------------ F( 3, 16) = 4250.11
Model | 60327.0974 3 20109.0325 Prob > F = 0.0000
Residual | 75.7026131 16 4.73141332 R-squared = 0.9987
---------+------------------------------ Adj R-squared = 0.9985
Total | 60402.80 19 3179.09474 Root MSE = 2.1752
------------------------------------------------------------------------------
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
female | -7.999147 11.37055 -0.703 0.492 -32.10363 16.10533
height | 3.189727 .1113503 28.646 0.000 2.953675 3.425779
femht | -1.093855 .1677774 -6.520 0.000 -1.449528 -.7381831
_cons | 5.601677 8.068862 0.694 0.497 -11.50355 22.7069
------------------------------------------------------------------------------
The term femht tests the null
hypothesis Ho: Bf = Bm. The T
value is -6.52 and is significant, indicating that the regression coefficient
Bf is significantly different from Bm.
Let's look at the parameter estimates to get a better understanding of what they mean and how they are interpreted.
First, recall that our dummy variable gender is 1 if female, and 0 if
male, therefore males are the omitted group. This is needed for proper interpretation of the estimates.
Parameter
Variable Estimate
INTERCEP 5.601677 : This is the intercept for the males (omitted group)
This corresponds to the intercept for males in
the separate groups analysis.
FEMALE -7.999147 : Intercept Females - Intercept males
This corresponds to differences of the
intercepts from the separate groups analysis.
and is indeed -2.397470040 - 5.601677149
HEIGHT 3.189727 : Slope for males (omitted group), i.e. Bm.
FEMHT -1.093855 : Slope for females - Slope for males
(i.e. Bf - Bm).
From the separate groups, this is indeed
2.095872170 - 3.189727463 .
Note that we constructed all of the variables manually to make it very clear
what each variable represented. However, in day to day use, you would
probably be more likely to use the xi prefix to generate the dummy
variables and interactions for you. For example,
xi: regress weight i.female*height
i.female _Ifemale_0-1 (naturally coded; _Ifemale_0 omitted)
i.female*height _IfemXheigh_# (coded as above)
Source | SS df MS Number of obs = 20
-------------+------------------------------ F( 3, 16) = 4250.11
Model | 60327.0974 3 20109.0325 Prob > F = 0.0000
Residual | 75.7026131 16 4.73141332 R-squared = 0.9987
-------------+------------------------------ Adj R-squared = 0.9985
Total | 60402.8 19 3179.09474 Root MSE = 2.1752
------------------------------------------------------------------------------
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Ifemale_1 | -7.999147 11.37055 -0.70 0.492 -32.10363 16.10533
height | 3.189727 .1113503 28.65 0.000 2.953675 3.425779
_IfemXheig~1 | -1.093855 .1677774 -6.52 0.000 -1.449528 -.7381831
_cons | 5.601677 8.068862 0.69 0.497 -11.50355 22.7069
------------------------------------------------------------------------------
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California.