Stata FAQ How can I compare regression coefficients across 3 (or more) groups?

Sometimes your research may predict that the size of a regression coefficient may vary across groups. For example, you might believe that the regression coefficient of height predicting weight would differ across 3 age groups (young, middle age, senior citizen). Below, we have a data file with 10 fictional young people, 10 fictional middle age people, and 10 fictional senior citizens, along with their height in inches and their weight in pounds. The variable age indicates the age group and is coded 1 for young people, 2 for middle aged, and 3 for senior citizens.
id age height weight
1  1    56     140
2  1    60     155
3  1    64     143
4  1    68     161
5  1    72     139
6  1    54     159
7  1    62     138
8  1    65     121
9  1    65     161
10  1    70     145
11  2    56     117
12  2    60     125
13  2    64     133
14  2    68     141
15  2    72     149
16  2    54     109
17  2    62     128
18  2    65     131
19  2    65     131
20  2    70     145
21  3    64     211
22  3    68     223
23  3    72     235
24  3    76     247
25  3    80     259
26  3    62     201
27  3    69     228
28  3    74     245
29  3    75     241
30  3    82     269
We analyze their data separately using the regress command below after first sorting by age.
use http://www.ats.ucla.edu/stat/stata/faq/compreg3, clear

sort age
by age: regress weight height
The parameter estimates (coefficients) for the young, middle age, and senior citizens are shown below, and the results do seem to suggest that height is a stronger predictor of weight for seniors (3.18) than for the middle aged (2.09). The results also seem to suggest that height does not predict weight as strongly for the young (-.37) as for the middle aged and seniors. However, we would need to perform specific significance tests to be able to make claims about the differences among these regression coefficients.
-> age=        1
------------------------------------------------------------------------------
weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
height |  -.3768309   .7743341     -0.487   0.640      -2.162449    1.408787
_cons |   170.1664   49.43018      3.443   0.009       56.18024    284.1526
------------------------------------------------------------------------------

-> age=        2
------------------------------------------------------------------------------
weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
height |   2.095872    .110491     18.969   0.000        1.84108    2.350665
_cons |   -2.39747   7.053272     -0.340   0.743      -18.66234     13.8674
------------------------------------------------------------------------------

-> age=        3
------------------------------------------------------------------------------
weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
height |   3.189727   .1232367     25.883   0.000       2.905543    3.473912
_cons |   5.601677   8.930197      0.627   0.548      -14.99139    26.19475
-----------------------------------------------------------------------------
We can compare the regression coefficients among these three age groups to test the null hypothesis

Ho: B1 = B2 = B3

where B1 is the regression for the young, B2 is the regression for the middle aged, and B3 is the regression for senior citizens. To do this analysis, we first make a dummy variable called age1 that is coded 1 if young (age=1), 0 otherwise, and age2 that is coded 1 if middle aged (age=2), 0 otherwise. We also create age1ht that is age1 times height, and age2ht that is age2 times height.
generate age1 = 0
generate age2 = 0
replace age1 = 1 if age==1
replace age2 = 1 if age==2
generate age1ht = age1*height
generate age2ht = age2*height
We can now use age1 age2 height, age1ht and age2ht as predictors in the regression equation in the regress command below. The regress command will be followed by the command:
test age1ht age2ht
which tests the null hypothesis:

Ho: B1 = B2 = B3

This test will have 2 df because it compares 3 regression coefficients.
regress weight age1 age2 height age1ht age2ht

Source |       SS       df       MS                  Number of obs =      30
---------+------------------------------               F(  5,    24) =  220.26
Model |  69595.3546     5  13919.0709               Prob > F      =  0.0000
Residual |  1516.64536    24  63.1935565               R-squared     =  0.9787
---------+------------------------------               Adj R-squared =  0.9742
Total |    71112.00    29  2452.13793               Root MSE      =  7.9494

------------------------------------------------------------------------------
weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
age1 |   164.5648    41.5549      3.960   0.001       78.79966    250.3299
age2 |  -7.999147    41.5549     -0.192   0.849      -93.76425    77.76596
height |   3.189727   .4069417      7.838   0.000       2.349841    4.029614
age1ht |  -3.566558   .6131609     -5.817   0.000       -4.83206   -2.301057
age2ht |  -1.093855   .6131609     -1.784   0.087      -2.359357    .1716466
_cons |   5.601677   29.48854      0.190   0.851      -55.25967    66.46303
------------------------------------------------------------------------------
The analysis below shows that the null hypothesis

Ho: B1 = B2 = B3

can be rejected (F=17.29, p = 0.0000). This means that the regression coefficients between height and weight do indeed significantly differ across the 3 age groups (young, middle age, senior citizen).
test age1ht age2ht

( 1)  age1ht = 0.0
( 2)  age2ht = 0.0

F(  2,    24) =   17.29
Prob > F =    0.0000
Note that we constructed all of the variables manually to make it very clear what each variable represented. However, in day to day use, you would probably be more likely to use the xi prefix to generate the dummy variables and interactions for you. For example,
xi: regress weight i.age*height

i.age             _Iage_1-3           (naturally coded; _Iage_1 omitted)
i.age*height      _IageXheigh_#       (coded as above)

Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  5,    24) =  220.26
Model |  69595.3546     5  13919.0709           Prob > F      =  0.0000
Residual |  1516.64536    24  63.1935565           R-squared     =  0.9787
-------------+------------------------------           Adj R-squared =  0.9742
Total |       71112    29  2452.13793           Root MSE      =  7.9494

------------------------------------------------------------------------------
weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iage_2 |  -172.5639   41.40619    -4.17   0.000    -258.0221   -87.10575
_Iage_3 |  -164.5648    41.5549    -3.96   0.001    -250.3299   -78.79966
height |  -.3768309   .4586553    -0.82   0.419    -1.323449    .5697872
_IageXheig~2 |   2.472703   .6486366     3.81   0.001     1.133983    3.811423
_IageXheig~3 |   3.566558   .6131609     5.82   0.000     2.301057     4.83206
_cons |   170.1664    29.2786     5.81   0.000     109.7384    230.5945
------------------------------------------------------------------------------
However, you may see that in this example the first age group is the omitted group, where previously the third group was the omitted group.  We can use the char command (shown below) to indicate we want the 3rd group to be the omitted group and then run the analysis again.
char age[omit] 3
xi: regress weight i.age*height

i.age             _Iage_1-3           (naturally coded; _Iage_3 omitted)
i.age*height      _IageXheigh_#       (coded as above)

Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  5,    24) =  220.26
Model |  69595.3546     5  13919.0709           Prob > F      =  0.0000
Residual |  1516.64536    24  63.1935565           R-squared     =  0.9787
-------------+------------------------------           Adj R-squared =  0.9742
Total |       71112    29  2452.13793           Root MSE      =  7.9494

------------------------------------------------------------------------------
weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iage_1 |   164.5648    41.5549     3.96   0.001     78.79966    250.3299
_Iage_2 |  -7.999147    41.5549    -0.19   0.849    -93.76425    77.76596
height |   3.189727   .4069417     7.84   0.000     2.349841    4.029614
_IageXheig~1 |  -3.566558   .6131609    -5.82   0.000     -4.83206   -2.301057
_IageXheig~2 |  -1.093855   .6131609    -1.78   0.087    -2.359357    .1716466
_cons |   5.601677   29.48854     0.19   0.851    -55.25967    66.46303
------------------------------------------------------------------------------

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.