UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

Stata FAQ
How can I use margins to understand a categorical by categorical by continuous 3-way interaction? (Stata 11)

The margins command, new in Stata 11, can be a very useful tool in understanding and interpreting interactions. On this page we will use margins to get simple slopes for a model with a categorical by categorical by continuous 3-way interaction. We will use the hsbdemo dataset with write as the response variable, female as one categorical variable and science as the continuous variable. We will begin by loading the data and creating a second categorical variable, hiread.
use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

generate hiread=read>=47

label def hilo 0 "lo" 1 "hi"
label values hiread hilo
Let's see how many observations we have in the each of the four cells of the 2x2 portion of the model.
tab hiread female

           |        female
    hiread |      male     female |     Total
-----------+----------------------+----------
        lo |        24         31 |        55 
        hi |        67         78 |       145 
-----------+----------------------+----------
     Total |        91        109 |       200 
Now we are ready to run our regression model.
regress write hiread##female##c.science

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  7,   192) =   30.18
       Model |  9366.83539     7  1338.11934           Prob > F      =  0.0000
    Residual |  8512.03961   192  44.3335396           R-squared     =  0.5239
-------------+------------------------------           Adj R-squared =  0.5065
       Total |   17878.875   199   89.843593           Root MSE      =  6.6583

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    1.hiread |  -23.43484   7.766185    -3.02   0.003    -38.75283   -8.116842
    1.female |  -8.282664   9.859841    -0.84   0.402    -27.73018    11.16485
             |
      hiread#|
      female |
        1 1  |   23.18969   12.27131     1.89   0.060    -1.014195    47.39358
             |
     science |   .0564852   .1242661     0.45   0.650    -.1886169    .3015873
             |
      hiread#|
   c.science |
          1  |   .6221636   .1544776     4.03   0.000     .3174725    .9268547
             |
      female#|
   c.science |
          1  |   .3741434   .2218437     1.69   0.093    -.0634203     .811707
             |
      hiread#|
      female#|
   c.science |
        1 1  |  -.5443797   .2578542    -2.11   0.036     -1.05297    -.035789
             |
       _cons |   38.62719   5.749743     6.72   0.000     27.28641    49.96796
------------------------------------------------------------------------------
So how does one interpret the significant hiread#female#science interaction shown in the output? One way of looking at this is to think that the slopes of write on science are not equal in each of the four cells of the hiread#female interaction. We can look at this by graphing the slopes for each of the cells using the twoway lfit command with the by option.
twoway lfit write science, by(hiread female) scheme(lean1)
  
  
The above graph suggests that the slope of write on science is flatter in the lo-male cell than in the other three. We can get the simple slopes for each of the four cells using margins hiread#female. The dydx(science) in the command gives the simple slopes while the post option will allow us to do tests of the differences in slope.
margins hiread#female, dydx(science) post

Average marginal effects                          Number of obs   =        200
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : science

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
science      |
      hiread#|
      female |
        0 0  |   .0564852   .1242661     0.45   0.649    -.1870719    .3000423
        0 1  |   .4306286   .1837731     2.34   0.019       .07044    .7908172
        1 0  |   .6786488   .0917674     7.40   0.000     .4987879    .8585096
        1 1  |   .5084125   .0940899     5.40   0.000     .3239997    .6928253
------------------------------------------------------------------------------
Indeed, the slope for lo-male (.0564852) is much lower than the other the slopes in the other three cells (.4306286, .6786488 and .5084125). We can compare the slope for that cell with each of the other three cells using a series of lincom commands.
lincom _b[0.hiread#0.female] - _b[0.hiread#1.female]

 ( 1)  [science]0bn.hiread#0bn.female - [science]0bn.hiread#1.female = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -.3741434   .2218437    -1.69   0.092     -.808949    .0606622
------------------------------------------------------------------------------

lincom _b[0.hiread#0.female] - _b[1.hiread#0.female]

  ( 1)  [science]0bn.hiread#0bn.female - [science]1.hiread#0bn.female = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -.6221636   .1544776    -4.03   0.000    -.9249341    -.319393
------------------------------------------------------------------------------

lincom _b[0.hiread#0.female] - _b[1.hiread#1.female]

  ( 1)  [science]0bn.hiread#0bn.female - [science]1.hiread#1.female = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -.4519273   .1558685    -2.90   0.004    -.7574239   -.1464308
------------------------------------------------------------------------------
The difference in slopes between lo-male and lo-female was not statistically significant but comparisons with hi-male and hi-female were significant.

Gee, that was fun, what about a four-way interaction.

categorical by categorical by categorical by continuous interaction

We will need a different dataset, hsbanova, and we will have to creat hiread again. This time we will use the anova command.
use http://www.ats.ucla.edu/stat/data/hsbanova, clear

generate hiread=read>=47

anova write female##hiread##grp##c.socst


                           Number of obs =     200     R-squared     =  0.5864
                           Root MSE      = 6.61518     Adj R-squared =  0.5129

                  Source |  Partial SS    df       MS           F     Prob > F
 ------------------------+----------------------------------------------------
                   Model |  10483.3404    30   349.44468       7.99     0.0000
                         |
                  female |  118.560779     1  118.560779       2.71     0.1016
                  hiread |  13.3156292     1  13.3156292       0.30     0.5819
           female#hiread |  7.8550e-08     1  7.8550e-08       0.00     1.0000
                     grp |  24.9917566     3  8.33058553       0.19     0.9029
              female#grp |  186.062732     3  62.0209107       1.42     0.2395
              hiread#grp |   87.306053     3  29.1020177       0.67     0.5746
       female#hiread#grp |  499.956657     3  166.652219       3.81     0.0113
                   socst |  141.112535     1  141.112535       3.22     0.0743
            female#socst |  43.2414468     1  43.2414468       0.99     0.3216
            hiread#socst |   5.3281839     1   5.3281839       0.12     0.7276
     female#hiread#socst |  1.01899126     1  1.01899126       0.02     0.8789
               grp#socst |  30.2183289     3  10.0727763       0.23     0.8753
        female#grp#socst |  176.274591     3   58.758197       1.34     0.2623
        hiread#grp#socst |  102.251128     3  34.0837093       0.78     0.5073
 female#hiread#grp#socst |  417.429351     2  208.714676       4.77     0.0097
                         |
                Residual |  7395.53461   169  43.7605599   
 ------------------------+----------------------------------------------------
                   Total |   17878.875   199   89.843593
Note the significant 4-way interaction. This time there are 16 cells in the 2x2x4 factorial model. The significant 4-way interaction indicates that the slopes of write on socst are not equal across the 16 cells. Let's see if we can show this graphically.
twoway lfit write socst, by(female hiread grp) scheme(lean1)

Wait a minute, there are only 15 plots here. What's going on? It looks like the cell for male, low read, grp4 is missing.
count if female==0 & hiread==0 & grp==4

    1
So the problem is that there is only one observation in that cell. Now that we have seen the graphs, let's compute the slopes for each of the cells using the margins command with the dydx option.
margins female#hiread#grp, dydx(socst) post

Average marginal effects                          Number of obs   =        200

Expression   : Linear prediction, predict()
dy/dx w.r.t. : socst

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
socst        |
      female#|
   hiread#grp|
      0 0 1  |    .030428   .2173668     0.14   0.889    -.3956032    .4564591
      0 0 2  |   .4794872   .2996085     1.60   0.110    -.1077347    1.066709
      0 0 3  |   .7428571   .4472676     1.66   0.097    -.1337712    1.619485
      0 0 4  |  (not estimable)
      0 1 1  |   .6050472   .1641192     3.69   0.000     .2833795    .9267148
      0 1 2  |   .6129032     .19313     3.17   0.002     .2343754     .991431
      0 1 3  |   .3614873   .1468047     2.46   0.014     .0737554    .6492192
      0 1 4  |   .2098765   .2078949     1.01   0.313    -.1975901    .6173431
      1 0 1  |     .60196   .1921498     3.13   0.002     .2253533    .9785667
      1 0 2  |   -.130597   .4040862    -0.32   0.747    -.9225914    .6613974
      1 0 3  |   .0403727   .5828858     0.07   0.945    -1.102063    1.182808
      1 0 4  |   1.833333   2.700634     0.68   0.497    -3.459813     7.12648
      1 1 1  |  -.1208054   .1874647    -0.64   0.519    -.4882293    .2466186
      1 1 2  |   .5698006   .2496737     2.28   0.022      .080449    1.059152
      1 1 3  |    .231117   .1253471     1.84   0.065    -.0145588    .4767929
      1 1 4  |    .485267   .2007366     2.42   0.016     .0918306    .8787035
------------------------------------------------------------------------------
As expected the slope for male, low read, grp4 was not estimable and could not be computed. However, the other slopes were estimated just fine and since we used the post option we can compare then using the lincom command. There are many possibilities for comparisons among the slopes but we will demonstrate the process by just comparing 1 with 2 and 1 with 3.
lincom _b[0.female#0.hiread#1.grp] - _b[0.female#0.hiread#2.grp]

 ( 1)  [socst]0bn.female#0bn.hiread#1bn.grp - [socst]0bn.female#0bn.hiread#2.grp = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -.4490592   .3701535    -1.21   0.225    -1.174547    .2764283
------------------------------------------------------------------------------

lincom _b[0.female#0.hiread#1.grp] - _b[0.female#0.hiread#3.grp]

 ( 1)  [socst]0bn.female#0bn.hiread#1bn.grp - [socst]0bn.female#0bn.hiread#3.grp = 0

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -.7124292   .4972893    -1.43   0.152    -1.687098    .2622399
------------------------------------------------------------------------------
Both lincom commands indicate the difference in slopes while appearing large are not statistically significant. This might be due to small sample sizes. Let's check.
count if female==0 & hiread==0 & grp==1

   11

count if female==0 & hiread==0 & grp==2

    8

count if female==0 & hiread==0 & grp==3

    4
The cell size for grp2 and grp3 are, in fact, rather small.

How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.