Stata FAQ
How can I compare regression coefficients across 3 (or more) groups?

Sometimes your research may predict that the size of a regression coefficient may vary across groups. For example, you might believe that the regression coefficient of height predicting weight would differ across 3 age groups (young, middle age, senior citizen). Below, we have a data file with 10 fictional young people, 10 fictional middle age people, and 10 fictional senior citizens, along with their height in inches and their weight in pounds. The variable age indicates the age group and is coded 1 for young people, 2 for middle aged, and 3 for senior citizens.
id age height weight
 1  1    56     140   
 2  1    60     155   
 3  1    64     143   
 4  1    68     161   
 5  1    72     139   
 6  1    54     159   
 7  1    62     138   
 8  1    65     121   
 9  1    65     161   
10  1    70     145   
11  2    56     117   
12  2    60     125   
13  2    64     133   
14  2    68     141   
15  2    72     149   
16  2    54     109   
17  2    62     128   
18  2    65     131   
19  2    65     131   
20  2    70     145   
21  3    64     211   
22  3    68     223   
23  3    72     235   
24  3    76     247   
25  3    80     259   
26  3    62     201   
27  3    69     228   
28  3    74     245   
29  3    75     241   
30  3    82     269
We analyze their data separately using the regress command below after first sorting by age.
use http://www.ats.ucla.edu/stat/stata/faq/compreg3, clear

sort age
by age: regress weight height
The parameter estimates (coefficients) for the young, middle age, and senior citizens are shown below, and the results do seem to suggest that height is a stronger predictor of weight for seniors (3.18) than for the middle aged (2.09). The results also seem to suggest that height does not predict weight as strongly for the young (-.37) as for the middle aged and seniors. However, we would need to perform specific significance tests to be able to make claims about the differences among these regression coefficients.
-> age=        1  
------------------------------------------------------------------------------
  weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  height |  -.3768309   .7743341     -0.487   0.640      -2.162449    1.408787
   _cons |   170.1664   49.43018      3.443   0.009       56.18024    284.1526
------------------------------------------------------------------------------

-> age=        2  
------------------------------------------------------------------------------
  weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  height |   2.095872    .110491     18.969   0.000        1.84108    2.350665
   _cons |   -2.39747   7.053272     -0.340   0.743      -18.66234     13.8674
------------------------------------------------------------------------------

-> age=        3  
------------------------------------------------------------------------------
  weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  height |   3.189727   .1232367     25.883   0.000       2.905543    3.473912
   _cons |   5.601677   8.930197      0.627   0.548      -14.99139    26.19475
-----------------------------------------------------------------------------
We can compare the regression coefficients among these three age groups to test the null hypothesis

Ho: B1 = B2 = B3

where B1 is the regression for the young, B2 is the regression for the middle aged, and B3 is the regression for senior citizens. To do this analysis, we first make a dummy variable called age1 that is coded 1 if young (age=1), 0 otherwise, and age2 that is coded 1 if middle aged (age=2), 0 otherwise. We also create age1ht that is age1 times height, and age2ht that is age2 times height.
generate age1 = 0
generate age2 = 0
replace age1 = 1 if age==1
replace age2 = 1 if age==2
generate age1ht = age1*height
generate age2ht = age2*height
We can now use age1 age2 height, age1ht and age2ht as predictors in the regression equation in the regress command below. The regress command will be followed by the command:
test age1ht age2ht
which tests the null hypothesis:

Ho: B1 = B2 = B3

This test will have 2 df because it compares 3 regression coefficients.
regress weight age1 age2 height age1ht age2ht

  Source |       SS       df       MS                  Number of obs =      30
---------+------------------------------               F(  5,    24) =  220.26
   Model |  69595.3546     5  13919.0709               Prob > F      =  0.0000
Residual |  1516.64536    24  63.1935565               R-squared     =  0.9787
---------+------------------------------               Adj R-squared =  0.9742
   Total |    71112.00    29  2452.13793               Root MSE      =  7.9494

------------------------------------------------------------------------------
  weight |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
    age1 |   164.5648    41.5549      3.960   0.001       78.79966    250.3299
    age2 |  -7.999147    41.5549     -0.192   0.849      -93.76425    77.76596
  height |   3.189727   .4069417      7.838   0.000       2.349841    4.029614
  age1ht |  -3.566558   .6131609     -5.817   0.000       -4.83206   -2.301057
  age2ht |  -1.093855   .6131609     -1.784   0.087      -2.359357    .1716466
   _cons |   5.601677   29.48854      0.190   0.851      -55.25967    66.46303
------------------------------------------------------------------------------
The analysis below shows that the null hypothesis

Ho: B1 = B2 = B3

can be rejected (F=17.29, p = 0.0000). This means that the regression coefficients between height and weight do indeed significantly differ across the 3 age groups (young, middle age, senior citizen).
test age1ht age2ht

 ( 1)  age1ht = 0.0
 ( 2)  age2ht = 0.0

       F(  2,    24) =   17.29
            Prob > F =    0.0000
Note that we constructed all of the variables manually to make it very clear what each variable represented. However, in day to day use, you would probably be more likely to use the xi prefix to generate the dummy variables and interactions for you. For example,
xi: regress weight i.age*height

i.age             _Iage_1-3           (naturally coded; _Iage_1 omitted)
i.age*height      _IageXheigh_#       (coded as above)

      Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  5,    24) =  220.26
       Model |  69595.3546     5  13919.0709           Prob > F      =  0.0000
    Residual |  1516.64536    24  63.1935565           R-squared     =  0.9787
-------------+------------------------------           Adj R-squared =  0.9742
       Total |       71112    29  2452.13793           Root MSE      =  7.9494

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     _Iage_2 |  -172.5639   41.40619    -4.17   0.000    -258.0221   -87.10575
     _Iage_3 |  -164.5648    41.5549    -3.96   0.001    -250.3299   -78.79966
      height |  -.3768309   .4586553    -0.82   0.419    -1.323449    .5697872
_IageXheig~2 |   2.472703   .6486366     3.81   0.001     1.133983    3.811423
_IageXheig~3 |   3.566558   .6131609     5.82   0.000     2.301057     4.83206
       _cons |   170.1664    29.2786     5.81   0.000     109.7384    230.5945
------------------------------------------------------------------------------
However, you may see that in this example the first age group is the omitted group, where previously the third group was the omitted group.  We can use the char command (shown below) to indicate we want the 3rd group to be the omitted group and then run the analysis again.
char age[omit] 3
xi: regress weight i.age*height

i.age             _Iage_1-3           (naturally coded; _Iage_3 omitted)
i.age*height      _IageXheigh_#       (coded as above)

      Source |       SS       df       MS              Number of obs =      30
-------------+------------------------------           F(  5,    24) =  220.26
       Model |  69595.3546     5  13919.0709           Prob > F      =  0.0000
    Residual |  1516.64536    24  63.1935565           R-squared     =  0.9787
-------------+------------------------------           Adj R-squared =  0.9742
       Total |       71112    29  2452.13793           Root MSE      =  7.9494

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     _Iage_1 |   164.5648    41.5549     3.96   0.001     78.79966    250.3299
     _Iage_2 |  -7.999147    41.5549    -0.19   0.849    -93.76425    77.76596
      height |   3.189727   .4069417     7.84   0.000     2.349841    4.029614
_IageXheig~1 |  -3.566558   .6131609    -5.82   0.000     -4.83206   -2.301057
_IageXheig~2 |  -1.093855   .6131609    -1.78   0.087    -2.359357    .1716466
       _cons |   5.601677   29.48854     0.19   0.851    -55.25967    66.46303
------------------------------------------------------------------------------

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.