UCLA Academic Technology Services HomeServicesClassesContactJobs

Stata FAQ
How can get anova main-effects with dummy coding?

Many researchers like to do their anova using regression with dummy coding but find it confusing when they don't get the same main-effects as in anova. This FAQ will show you how to get those main-effects.

Stata version 10 and earlier

Let's begin by showing the normal anova using a dataset called crf24 to use as a comparison.

use http://www.ats.ucla.edu/stat/stata/faq/crf24, clear

anova y a b a*b

                           Number of obs =      32     R-squared     =  0.9214
                           Root MSE      = .877971     Adj R-squared =  0.8985

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |         217     7          31      40.22     0.0000
                         |
                       a |       3.125     1       3.125       4.05     0.0554
                       b |       194.5     3  64.8333333      84.11     0.0000
                     a*b |      19.375     3  6.45833333       8.38     0.0006
                         |
                Residual |        18.5    24  .770833333   
              -----------+----------------------------------------------------
                   Total |       235.5    31  7.59677419  
Next, we will manually compute the various dummy variables and run the regression model.
tab a, gen(a)
tab b, gen(b)
generate ab1 = a1*b1
generate ab2 = a1*b2
generate ab3 = a1*b3

regress y a1 b1 b2 b3 ab1 ab2 ab3

      Source |       SS       df       MS              Number of obs =      32
-------------+------------------------------           F(  7,    24) =   40.22
       Model |         217     7          31           Prob > F      =  0.0000
    Residual |        18.5    24  .770833333           R-squared     =  0.9214
-------------+------------------------------           Adj R-squared =  0.8985
       Total |       235.5    31  7.59677419           Root MSE      =  .87797

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          a1 |         -2   .6208194    -3.22   0.004    -3.281308   -.7186918
          b1 |      -8.25   .6208194   -13.29   0.000    -9.531308   -6.968692
          b2 |         -7   .6208194   -11.28   0.000    -8.281308   -5.718692
          b3 |       -4.5   .6208194    -7.25   0.000    -5.781308   -3.218692
         ab1 |          4   .8779711     4.56   0.000     2.187957    5.812043
         ab2 |          3   .8779711     3.42   0.002     1.187957    4.812043
         ab3 |        3.5   .8779711     3.99   0.001     1.687957    5.312043
       _cons |         10   .4389856    22.78   0.000     9.093978    10.90602
------------------------------------------------------------------------------
For this model a2 is the reference level for a and b4 is the reference level for b, i.e., they are the omitted levels.

Here is the test of the a*b interaction.

test ab1 ab2 ab3

 ( 1)  ab1 = 0
 ( 2)  ab2 = 0
 ( 3)  ab3 = 0

       F(  3,    24) =    8.38
            Prob > F =    0.0006
To get the main-effect for a we will use the the dummy for a plus the a*b interaction dummies averaged across the four levels of b.
test a1 + (ab1+ab2+ab3)/4 = 0

 ( 1)  a1 + .25 ab1 + .25 ab2 + .25 ab3 = 0

       F(  1,    24) =    4.05
            Prob > F =    0.0554
The main-effect for b is a little bit trickier because it is a 3 degree of freedom test so we will have to do the test command three times and make use of the accumulate option.
test b1 + ab1/2 = 0

 ( 1)  b1 + .5 ab1 = 0

       F(  1,    24) =  202.70
            Prob > F =    0.0000

test b2 + ab2/2 = 0, accumulate

 ( 1)  b1 + .5 ab1 = 0
 ( 2)  b2 + .5 ab2 = 0

       F(  2,    24) =  120.86
            Prob > F =    0.0000

test b3 + ab3/2 = 0, accumulate

 ( 1)  b1 + .5 ab1 = 0
 ( 2)  b2 + .5 ab2 = 0
 ( 3)  b3 + .5 ab3 = 0

       F(  3,    24) =   84.11
            Prob > F =    0.0000
The last test command has our main-effect for b

So, what's with all of the division, by 4 in the a main-effect and by 2 in the b main-effect. The dummy variable a1 is actually the simple effect of a. To get the "true" main-effect of a we have to combine the simple effect of a with the average of the interaction effects across the four levels of b. Likewise, for the b main-effect we need to combine the simple main-effects of the levels of b with the average interaction effect across the two levels of a.

Stata 11

Here is how the above analyses would look using Stata 11's factor variables. anova y a##b Number of obs = 32 R-squared = 0.9214 Root MSE = .877971 Adj R-squared = 0.8985 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 217 7 31 40.22 0.0000 | a | 3.125 1 3.125 4.05 0.0554 b | 194.5 3 64.8333333 84.11 0.0000 a#b | 19.375 3 6.45833333 8.38 0.0006 | Residual | 18.5 24 .770833333 -----------+---------------------------------------------------- Total | 235.5 31 7.59677419 regress y a##b Source | SS df MS Number of obs = 32 -------------+------------------------------ F( 7, 24) = 40.22 Model | 217 7 31 Prob > F = 0.0000 Residual | 18.5 24 .770833333 R-squared = 0.9214 -------------+------------------------------ Adj R-squared = 0.8985 Total | 235.5 31 7.59677419 Root MSE = .87797 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 2.a | -2 .6208194 -3.22 0.004 -3.281308 -.7186918 | b | 2 | .25 .6208194 0.40 0.691 -1.031308 1.531308 3 | 3.25 .6208194 5.24 0.000 1.968692 4.531308 4 | 4.25 .6208194 6.85 0.000 2.968692 5.531308 | a#b | 2 2 | 1 .8779711 1.14 0.266 -.8120434 2.812043 2 3 | .5 .8779711 0.57 0.574 -1.312043 2.312043 2 4 | 4 .8779711 4.56 0.000 2.187957 5.812043 | _cons | 3.75 .4389856 8.54 0.000 2.843978 4.656022 ------------------------------------------------------------------------------ /* test of ab interaction */ test (2.a#2.b==0)(2.a#3.b==0)(2.a#4.b == 0) ( 1) 2.a#2.b = 0 ( 2) 2.a#3.b = 0 ( 3) 2.a#4.b = 0 F( 3, 24) = 8.38 Prob > F = 0.0006 /* test of a main effect */ test 2.a + (2.a#2.b+2.a#3.b+2.a#4.b)/4 == 0 ( 1) 2.a + .25*2.a#2.b + .25*2.a#3.b + .25*2.a#4.b = 0 F( 1, 24) = 4.05 Prob > F = 0.0554 /* test of b main effect */ test (2.b + 2.a#2.b/2 = 0)(3.b + 2.a#3.b/2 = 0)(4.b + 2.a#4.b/2 = 0) ( 1) 2.b + .5*2.a#2.b = 0 ( 2) 3.b + .5*2.a#3.b = 0 ( 3) 4.b + .5*2.a#4.b = 0 F( 3, 24) = 84.11 Prob > F = 0.0000 Example 2

Stata version 10 and earlier

This method generalizes to more complex designs with multiple factors so let's consider a 3-factor completely crossed design.

use http://www.ats.ucla.edu/stat/stata/faq/threeway, clear

anova y a b c a*b a*c b*c a*b*c

                           Number of obs =      24     R-squared     =  0.9689
                           Root MSE      =  1.1547     Adj R-squared =  0.9403

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  497.833333    11  45.2575758      33.94     0.0000
                         |
                       a |         150     1         150     112.50     0.0000
                       b |  .666666667     1  .666666667       0.50     0.4930
                       c |  127.583333     2  63.7916667      47.84     0.0000
                     a*b |  160.166667     1  160.166667     120.13     0.0000
                     a*c |       18.25     2       9.125       6.84     0.0104
                     b*c |  22.5833333     2  11.2916667       8.47     0.0051
                   a*b*c |  18.5833333     2  9.29166667       6.97     0.0098
                         |
                Residual |          16    12  1.33333333   
              -----------+----------------------------------------------------
                   Total |  513.833333    23  22.3405797 
Once again we will manually create the dummy variables and run the regression model.
recode a (1=0)(2=1)
recode b (1=0)(2=1)
tab c, gen(c)
gen ab=a*b
gen ac1=a*c1
gen ac2=a*c2
gen bc1=b*c1
gen bc2=b*c2
gen abc1=a*b*c1
gen abc2=a*b*c2

regress y a b c1 c2 ab ac1 ac2 bc1 bc2 abc1 abc2

      Source |       SS       df       MS              Number of obs =      24
-------------+------------------------------           F( 11,    12) =   33.94
       Model |  497.833333    11  45.2575758           Prob > F      =  0.0000
    Residual |          16    12  1.33333333           R-squared     =  0.9689
-------------+------------------------------           Adj R-squared =  0.9403
       Total |  513.833333    23  22.3405797           Root MSE      =  1.1547

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |        -.5   1.154701    -0.43   0.673    -3.015876    2.015876
           b |       -9.5   1.154701    -8.23   0.000    -12.01588   -6.984124
          c1 |         -8   1.154701    -6.93   0.000    -10.51588   -5.484124
          c2 |         -4   1.154701    -3.46   0.005    -6.515876   -1.484124
          ab |         15   1.632993     9.19   0.000     11.44201    18.55799
         ac1 |   6.39e-14   1.632993     0.00   1.000    -3.557986    3.557986
         ac2 |          1   1.632993     0.61   0.552    -2.557986    4.557986
         bc1 |          9   1.632993     5.51   0.000     5.442014    12.55799
         bc2 |          5   1.632993     3.06   0.010     1.442014    8.557986
        abc1 |       -8.5   2.309401    -3.68   0.003    -13.53175   -3.468247
        abc2 |       -5.5   2.309401    -2.38   0.035    -10.53175   -.4682473
       _cons |         19   .8164966    23.27   0.000     17.22101    20.77899
------------------------------------------------------------------------------
Here is thetest of the three-way a*b*c interaction.
test abc1 abc2

 ( 1)  abc1 = 0
 ( 2)  abc2 = 0

       F(  2,    12) =    6.97
            Prob > F =    0.0098
Next come the two-way interactions with both a*c and b*c using the accumulate options.
/* a*b interaction */

test ab + (abc1+abc2)/3 = 0

 ( 1)  ab + .3333333 abc1 + .3333333 abc2 = 0

       F(  1,    12) =  120.13
            Prob > F =    0.0000
 
/* a*c interaction) */

test ac1 + abc1/2 = 0

 ( 1)  ac1 + .5 abc1 = 0

       F(  1,    12) =   13.55
            Prob > F =    0.0031

test ac2 + abc2/2 = 0, accumulate

 ( 1)  ac1 + .5 abc1 = 0
 ( 2)  ac2 + .5 abc2 = 0

       F(  2,    12) =    6.84
            Prob > F =    0.0104

/* b*c interaction */

test bc1 + abc1/2 = 0

 ( 1)  bc1 + .5 abc1 = 0

       F(  1,    12) =   16.92
            Prob > F =    0.0014

test bc2 + abc2/2 = 0, accumulate

 ( 1)  bc1 + .5 abc1 = 0
 ( 2)  bc2 + .5 abc2 = 0

       F(  2,    12) =    8.47
            Prob > F =    0.0051
Finally, we get to the main-effects.
/* a main-effect */

test a + ab/2 + (ac1+ac2)/3 + (abc1+abc2)/6 = 0

 ( 1)  a + .5 ab + .3333333 ac1 + .3333333 ac2 + .1666667 abc1 + .1666667 abc2 = 0

       F(  1,    12) =  112.50
            Prob > F =    0.0000

/* b main-effect */

test b + ab/2 + (bc1+bc2)/3 + (abc1+abc2)/6 = 0

 ( 1)  b + .5 ab + .3333333 bc1 + .3333333 bc2 + .1666667 abc1 + .1666667 abc2 = 0

       F(  1,    12) =    0.50
            Prob > F =    0.4930

/* c main-effect */

test c1 + ac1/2 + bc1/2 + abc1/4 = 0

 ( 1)  c1 + .5 ac1 + .5 bc1 + .25 abc1 = 0

       F(  1,    12) =   94.92
            Prob > F =    0.0000

test c2 + ac2/2 + bc2/2 + abc2/4 = 0, accumulate

 ( 1)  c1 + .5 ac1 + .5 bc1 + .25 abc1 = 0
 ( 2)  c2 + .5 ac2 + .5 bc2 + .25 abc2 = 0

       F(  2,    12) =   47.84
            Prob > F =    0.0000
Stata 11

And here are the same analyses using Stata 11.

anova y a##b##c

                           Number of obs =      24     R-squared     =  0.9689
                           Root MSE      =  1.1547     Adj R-squared =  0.9403

                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  497.833333    11  45.2575758      33.94     0.0000
                         |
                       a |         150     1         150     112.50     0.0000
                       b |  .666666667     1  .666666667       0.50     0.4930
                     a#b |  160.166667     1  160.166667     120.12     0.0000
                       c |  127.583333     2  63.7916667      47.84     0.0000
                     a#c |       18.25     2       9.125       6.84     0.0104
                     b#c |  22.5833333     2  11.2916667       8.47     0.0051
                   a#b#c |  18.5833333     2  9.29166667       6.97     0.0098
                         |
                Residual |          16    12  1.33333333   
              -----------+----------------------------------------------------
                   Total |  513.833333    23  22.3405797   

regress y a##b##c

      Source |       SS       df       MS              Number of obs =      24
-------------+------------------------------           F( 11,    12) =   33.94
       Model |  497.833333    11  45.2575758           Prob > F      =  0.0000
    Residual |          16    12  1.33333333           R-squared     =  0.9689
-------------+------------------------------           Adj R-squared =  0.9403
       Total |  513.833333    23  22.3405797           Root MSE      =  1.1547

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         2.a |        -.5   1.154701    -0.43   0.673    -3.015876    2.015876
         2.b |        -.5   1.154701    -0.43   0.673    -3.015876    2.015876
             |
         a#b |
        2 2  |        6.5   1.632993     3.98   0.002     2.942014    10.05799
             |
           c |
          2  |          4   1.154701     3.46   0.005     1.484124    6.515876
          3  |          8   1.154701     6.93   0.000     5.484124    10.51588
             |
         a#c |
        2 2  |          1   1.632993     0.61   0.552    -2.557986    4.557986
        2 3  |  -1.10e-14   1.632993    -0.00   1.000    -3.557986    3.557986
             |
         b#c |
        2 2  |         -4   1.632993    -2.45   0.031    -7.557986   -.4420135
        2 3  |         -9   1.632993    -5.51   0.000    -12.55799   -5.442014
             |
       a#b#c |
      2 2 2  |          3   2.309401     1.30   0.218    -2.031753    8.031753
      2 2 3  |        8.5   2.309401     3.68   0.003     3.468247    13.53175
             |
       _cons |         11   .8164966    13.47   0.000     9.221007    12.77899
------------------------------------------------------------------------------

/* abc interaction */

test 2.a#2.b#2.c 2.a#2.b#3.c

 ( 1)  2.a#2.b#2.c = 0
 ( 2)  2.a#2.b#3.c = 0

       F(  2,    12) =    6.97
            Prob > F =    0.0098

/* ab interaction */

test 2.a#2.b + (2.a#2.b#2.c+2.a#2.b#3.c)/3 == 0

 ( 1)  2.a#2.b + .3333333*2.a#2.b#2.c + .3333333*2.a#2.b#3.c = 0

       F(  1,    12) =  120.13
            Prob > F =    0.0000

/* ac interaction) */

test (2.a#2.c + 2.a#2.b#2.c/2 == 0)(2.a#3.c + 2.a#2.b#3.c/2 == 0)

 ( 1)  2.a#2.c + .5*2.a#2.b#2.c = 0
 ( 2)  2.a#3.c + .5*2.a#2.b#3.c = 0

       F(  2,    12) =    6.84
            Prob > F =    0.0104

/* bc interaction */

test (2.b#2.c + 2.a#2.b#2.c/2 == 0)(2.b#3.c + 2.a#2.b#3.c/2 == 0)

 ( 1)  2.b#2.c + .5*2.a#2.b#2.c = 0
 ( 2)  2.b#3.c + .5*2.a#2.b#3.c = 0

       F(  2,    12) =    8.47
            Prob > F =    0.0051

 /* a main-effect */

test 2.a + 2.a#2.b/2 + (2.a#2.c+2.a#3.c)/3 + (2.a#2.b#2.c+2.a#2.b#3.c)/6 == 0

 ( 1)  2.a + .5*2.a#2.b + .3333333*2.a#2.c + .3333333*2.a#3.c + .1666667*2.a#2.b#2.c + .1666667*2.a#2.b#3.c = 0

       F(  1,    12) =  112.50
            Prob > F =    0.0000

/* b main-effect */

test 2.b + 2.a#2.b/2 + (2.b#2.c+2.b#3.c)/3 + (2.a#2.b#2.c+2.a#2.b#3.c)/6 == 0

 ( 1)  2.b + .5*2.a#2.b + .3333333*2.b#2.c + .3333333*2.b#3.c + .1666667*2.a#2.b#2.c + .1666667*2.a#2.b#3.c = 0

       F(  1,    12) =    0.50
            Prob > F =    0.4930

/* c main-effect */

test (2.c + 2.a#2.c/2 + 2.b#2.c/2 + 2.a#2.b#2.c/4 == 0)(3.c + 2.a#3.c/2 + 2.b#3.c/2 + 2.a#2.b#3.c/4 == 0)

 ( 1)  2.c + .5*2.a#2.c + .5*2.b#2.c + .25*2.a#2.b#2.c = 0
 ( 2)  3.c + .5*2.a#3.c + .5*2.b#3.c + .25*2.a#2.b#3.c = 0

       F(  2,    12) =   47.84
            Prob > F =    0.0000

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.