Stata FAQ
How can I compare regression coefficients between 2 groups?

Sometimes your research may predict that the size of a regression coefficient should be bigger for one group than for another. For example, you might believe that the regression coefficient of height predicting weight would be higher for men than for women. Below, we have a data file with 10 fictional females and 10 fictional males, along with their height in inches and their weight in pounds.
 

id gender height weight
 1   F     56     117   
 2   F     60     125   
 3   F     64     133   
 4   F     68     141   
 5   F     72     149   
 6   F     54     109   
 7   F     62     128   
 8   F     65     131   
 9   F     65     131   
10   F     70     145   
11   M     64     211   
12   M     68     223   
13   M     72     235   
14   M     76     247   
15   M     80     259   
16   M     62     201   
17   M     69     228   
18   M     74     245   
19   M     75     241   
20   M     82     269   
We analyzed their data separately using the regress command below after first sorting by gender.
use http://www.ats.ucla.edu/stat/stata/faq/compreg2, clear
sort gender
by gender: regress weight height
The parameter estimates (coefficients) for females and males are shown below, and the results do seem to suggest that height is a stronger predictor of weight for males (3.19) than for females (2.1).
-> gender=F  
  Source |       SS       df       MS                  Number of obs =      10
---------+------------------------------               F(  1,     8) =  359.81
   Model |  1319.56112     1  1319.56112               Prob > F      =  0.0000
Residual |  29.3388815     8  3.66736019               R-squared     =  0.9782
---------+------------------------------               Adj R-squared =  0.9755
   Total |     1348.90     9  149.877778               Root MSE      =   1.915

------------------------------------------------------------------------------
  weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  height |   2.095872    .110491     18.969   0.000        1.84108    2.350665
   _cons |   -2.39747   7.053272     -0.340   0.743      -18.66234     13.8674
------------------------------------------------------------------------------

-> gender=M  
  Source |       SS       df       MS                  Number of obs =      10
---------+------------------------------               F(  1,     8) =  669.93
   Model |  3882.53627     1  3882.53627               Prob > F      =  0.0000
Residual |  46.3637317     8  5.79546646               R-squared     =  0.9882
---------+------------------------------               Adj R-squared =  0.9867
   Total |     3928.90     9  436.544444               Root MSE      =  2.4074

------------------------------------------------------------------------------
  weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  height |   3.189727   .1232367     25.883   0.000       2.905543    3.473912
   _cons |   5.601677   8.930197      0.627   0.548      -14.99139    26.19475
------------------------------------------------------------------------------
We can compare the regression coefficients of males with females to test the null hypothesis Ho: Bf = Bm, where Bf is the regression coefficient for females, and Bm is the regression coefficient for males. To do this analysis, we first make a dummy variable called female that is coded 1 for female, and 0 for male and femht that is the product of female and height. We then use female height and femht as predictors in the regression equation.
generate female=.
replace female = 1 if gender == "F"
replace female = 0 if gender == "M"
generate femht = female*height
regress weight female height femht
The output is shown below
  Source |       SS       df       MS                  Number of obs =      20
---------+------------------------------               F(  3,    16) = 4250.11
   Model |  60327.0974     3  20109.0325               Prob > F      =  0.0000
Residual |  75.7026131    16  4.73141332               R-squared     =  0.9987
---------+------------------------------               Adj R-squared =  0.9985
   Total |    60402.80    19  3179.09474               Root MSE      =  2.1752

------------------------------------------------------------------------------
  weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  female |  -7.999147   11.37055     -0.703   0.492      -32.10363    16.10533
  height |   3.189727   .1113503     28.646   0.000       2.953675    3.425779
   femht |  -1.093855   .1677774     -6.520   0.000      -1.449528   -.7381831
   _cons |   5.601677   8.068862      0.694   0.497      -11.50355     22.7069
------------------------------------------------------------------------------
The term femht tests the null hypothesis Ho: Bf = Bm. The T value is -6.52 and is significant, indicating that the regression coefficient Bf is significantly different from Bm.

Let's look at the parameter estimates to get a better understanding of what they mean and how they are interpreted.

First, recall that our dummy variable gender is 1 if female, and 0 if male, therefore males are the omitted group. This is needed for proper interpretation of the estimates.
	  Parameter 
Variable  Estimate 
INTERCEP  5.601677 : This is the intercept for the males (omitted group) 
                     This corresponds to the intercept for males in 
                     the separate groups analysis. 

FEMALE   -7.999147 : Intercept Females - Intercept males 
                     This corresponds to differences of the 
                     intercepts from the separate groups analysis. 
                     and is indeed -2.397470040 - 5.601677149 

HEIGHT    3.189727 : Slope for males (omitted group), i.e. Bm. 

FEMHT    -1.093855 : Slope for females - Slope for males 
                     (i.e. Bf - Bm). 
                     From the separate groups, this is indeed 
                     2.095872170 - 3.189727463 . 
Note that we constructed all of the variables manually to make it very clear what each variable represented.  However, in day to day use, you would probably be more likely to use the xi prefix to generate the dummy variables and interactions for you.  For example,
xi: regress weight i.female*height

i.female          _Ifemale_0-1        (naturally coded; _Ifemale_0 omitted)
i.female*height   _IfemXheigh_#       (coded as above)

      Source |       SS       df       MS              Number of obs =      20
-------------+------------------------------           F(  3,    16) = 4250.11
       Model |  60327.0974     3  20109.0325           Prob > F      =  0.0000
    Residual |  75.7026131    16  4.73141332           R-squared     =  0.9987
-------------+------------------------------           Adj R-squared =  0.9985
       Total |     60402.8    19  3179.09474           Root MSE      =  2.1752

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  _Ifemale_1 |  -7.999147   11.37055    -0.70   0.492    -32.10363    16.10533
      height |   3.189727   .1113503    28.65   0.000     2.953675    3.425779
_IfemXheig~1 |  -1.093855   .1677774    -6.52   0.000    -1.449528   -.7381831
       _cons |   5.601677   8.068862     0.69   0.497    -11.50355     22.7069
------------------------------------------------------------------------------

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.