### Stata Library Using Stata to deal with violations of the homogeneity of variance assumption in ANOVA

One of the assumptions of Analysis of Variance (ANOVA) is that the variances of the dependent variable is the same across the groups being studied.  When this assumption is violated, the results of the analysis may not be trustworthy, namely that the reported p-value from the significance test may be too liberal (yielding a higher than expected type I error rate) or too conservative (yielding a lower than expected type I error rate).  We will look at three Stata programs that you can download for analyzing data where you suspect you might have violated the homogeneity of variance assumption.  These Stata programs can help you assess whether a standard ANOVA will be too liberal or too conservative given your data, and show you how you can perform alternative analyses that are more robust to violations of the homogeneity of variance assumption.

First, let's consider the program simanova. You can download simanova by typing findit simanova (see How can I use the findit command to search for programs and get additional help? for more information about using findit).

As shown below, we can supply information about a hypothetical study that has 3 groups with sample sizes of 10 in each group and standard deviations of 1 in each group. Since we have not mentioned the means for the groups, the means for the groups are assumed to be equal.  Given these conditions, which are consistent with the assumptions of ANOVA, we simulate 5000 analyses and report the number of significant results for a nominal p-value of 0.05.  As we would expect, the proportion of   results that were significant at 0.05 was 0.0488, quite similar to 0.05 and the confidence interval contains 0.05.
simanova , groups(3) n(10 10 10) s(1 1 1) nomp(0.05) reps(5000)

Information about Sample Sizes and Standard Deviations
------------------------------------------------------
N1 = 10 and S1 = 1
N2 = 10 and S2 = 1
N3 = 10 and S3 = 1

5000 simulated ANOVA F tests
--------------------------------
Nominal  Simulated   Simulated P value
P Value  P Value     [95% Conf. Interval]
-----------------------------------------
0.0500   0.0488       0.0430 - 0.0551
Let's now make the standard deviations unequal, making the standard deviation for group 3 to be 3.  Even though this technically violates the homogeneity of variance assumption, it has been found that ANOVA is fairly robust to this violation when the sample sizes are equal. Indeed, as we see below, the actual proportion significant (0.0714) somewhat exceeds the proportion we would expect (0.05), making the ordinary F test under these conditions somewhat too liberal.
simanova , groups(3) n(10 10 10) s(1 1 3) nomp(0.05) reps(5000)

Information about Sample Sizes and Standard Deviations
------------------------------------------------------
N1 = 10 and S1 = 1
N2 = 10 and S2 = 1
N3 = 10 and S3 = 3

5000 simulated ANOVA F tests
--------------------------------
Nominal  Simulated   Simulated P value
P Value  P Value     [95% Conf. Interval]
-----------------------------------------
0.0500   0.0714       0.0644 - 0.0789
If we also make the sample sizes unequal, then ANOVA is known to show type I error rates than can be quite different from expectations. As shown below, we now make the sample size for group three equal to 40, while leaving the sample size as 10 for groups 1 and 2.  As we see below, the observed number of significant results at 0.05 is much below what we expect (at 0.0018).  In this case, when the groups with the higher standard deviations also have the higher sample sizes, the ANOVA test becomes too conservative.  When you think a result is significant at 0.05, it really is significant at around 0.0018.
simanova , groups(3) n(10 10 40) s(1 1 3) nomp(0.05) reps(5000)

Information about Sample Sizes and Standard Deviations
------------------------------------------------------
N1 = 10 and S1 = 1
N2 = 10 and S2 = 1
N3 = 40 and S3 = 3

5000 simulated ANOVA F tests
--------------------------------
Nominal  Simulated   Simulated P value
P Value  P Value     [95% Conf. Interval]
-----------------------------------------
0.0500   0.0018       0.0008 - 0.0034
Let's reverse the pattern shown above, and make groups 1 and 2 have sample sizes of 40, and group 1 have a sample size of 10.  In this case, the groups with the higher sample size have the lower standard deviation.  As you might have expected, the results below show that the ANOVA test is too liberal under these conditions.  When you believed the probability of a type I error was 0.05, it was actually around 27%.
simanova , groups(3) n(40 40 10) s(1 1 3) nomp(0.05) reps(5000)

Information about Sample Sizes and Standard Deviations
------------------------------------------------------
N1 = 40 and S1 = 1
N2 = 40 and S2 = 1
N3 = 10 and S3 = 3

5000 simulated ANOVA F tests
--------------------------------
Nominal  Simulated   Simulated P value
P Value  P Value     [95% Conf. Interval]
-----------------------------------------
0.0500   0.2778       0.2654 - 0.2904
So far, we have illustrated how you can use simanova to assess actual type I error rates given an arbitrary set of sample sizes and standard deviations.  Let's look at an example of how we can use simanova in analyzing our data.  Consider the fictitious data file below called homvar with a dependent variable called dv and an independent variable called group.   We can use this data file and perform a standard ANOVA on it. Based just on the results below, we would conclude that there is a relationship between group and the score on dv.
use simstb, clear
anova dv group
Number of obs =     100     R-squared     =  0.0636
Root MSE      = 2.59622     Adj R-squared =  0.0443

Source |  Partial SS    df       MS           F     Prob > F
-----------+----------------------------------------------------
Model |  44.4095133     2  22.2047567       3.29     0.0413
|
group |  44.4095133     2  22.2047567       3.29     0.0413
|
Residual |   653.81335    97  6.74034381
-----------+----------------------------------------------------
Total |  698.222863    99  7.05275619   
However, do these data meet the assumptions of ANOVA?   As you see below, the sample sizes are unequal and the groups with the smaller sample sizes have the larger standard deviations.
tabulate group, sum(dv)

|            Summary of dv
group |        Mean   Std. Dev.       Freq.
------------+------------------------------------
1 |   2.1116685   6.3250411          10
2 |  -.22530662   2.8095381          30
3 |  -.02013056   1.0483759          60
------------+------------------------------------
Total |   .13149653   2.6557026         100
We can use simanova to perform simulations given this pattern of sample sizes and standard deviations, assuming the means are equal, and assess the type I error rate that would be expected given this pattern of data.  By supplying the name of the dependent variable (dv) followed by the independent variable (group), simanova computes the sample sizes and standard deviations for the groups, reports them back to you, and shows the results of performing an ordinary ANOVA and then shows the results of 5000 simulated F tests (where there were no differences in the means of the groups).   As shown below, the ANOVA reported a p value of 0.0413, as compared to the simulated p value of 0.3110.   These simulation results suggest that the actual p value for this test is really about 31%, not less than 5%.
simanova dv group, nomp(0.05) reps(5000)

Information about Sample Sizes and Standard Deviations
------------------------------------------------------
N1 = 10 and S1 = 6.3250413
N2 = 30 and S2 = 2.8095381
N3 = 60 and S3 = 1.0483758

Results of Standard ANOVA

----------------------------------------------------------------------
Dependent Variable is dv and Independent Variable is group
F(  2,  97.00) =   3.294, p= 0.0413
----------------------------------------------------------------------

5000 simulated ANOVA F tests
--------------------------------
Nominal  Simulated   Simulated P value
P Value  P Value     [95% Conf. Interval]
-----------------------------------------
0.0413   0.3110       0.2982 - 0.3240
0.0500   0.3320       0.3189 - 0.3452
Another way that we can handle this is to use a test that is less sensitive to violations of homogeneity of variance. The F* test is a modification of the standard F test that is much less sensitive to violations of the homogeneity of variance assumption.  Let's analyze this data using the fstar command.  As you see, the results of fstar are much more in line with the results that we found with the simulation.

fstar dv group

----------------------------------------------------------------------
Dependent Variable is dv and Independent Variable is group
Fstar(  2,  12.14) =   1.058, p= 0.3771
----------------------------------------------------------------------
The W test is another test that is more robust to violations of homogeneity of variance than the traditional F test.  Let's use the wtest command to perform this test.  While the results are not identical to the F* test, they do agree in that both tests indicate that these results are far from being significant. You can download the wtest command by typing findit wtest (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
wtest dv group

----------------------------------------------------------------------
Dependent Variable is dv and Independent Variable is group
WStat(  2,  18.99) =   0.626, p= 0.5457
----------------------------------------------------------------------
Let's consider a couple of other alternatives, some which have not been explored very much to our knowledge.  First, what if we recast this ANOVA into a regression with dummy variables and then use the robust option to request robust standard errors. Below we see that the test of the two dummy variables is quite comparable to the simulated results, the F* results and the W test results.  Had we omitted the robust option, we would have gotten the same p value as the standard ANOVA, but it appears from this one example that the robust option may be offering some robustness with respect to the homogeneity of variance assumption. One example is certainly not enough, so we will investigate this later in this page.
xi: regress dv i.group, robust

i.group           _Igroup_1-3         (naturally coded; _Igroup_1 omitted)

Regression with robust standard errors                 Number of obs =     100
F(  2,    97) =    0.69
Prob > F      =  0.5030
R-squared     =  0.0636
Root MSE      =  2.5962

------------------------------------------------------------------------------
|               Robust
dv |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Igroup_2 |  -2.336975    1.99352    -1.17   0.244    -6.293561    1.619611
_Igroup_3 |  -2.131799   1.931445    -1.10   0.272    -5.965183    1.701585
_cons |   2.111669   1.926632     1.10   0.276    -1.712162    5.935499
------------------------------------------------------------------------------

test _Igroup_2 _Igroup_3

( 1)  _Igroup_2 = 0.0
( 2)  _Igroup_3 = 0.0

F(  2,    97) =    0.69
Prob > F =    0.5030
If the use of the robust option is useful, what about using the rreg command for robust regression?  Consider the example below. The rreg results are even more dramatically off base than the standard ANOVA, yielding a p value of 0.0000 .  Again, one example is not enough to draw a conclusion, but this suggests to us that rreg may not perform well when the homogeneity of variance assumption is violated.
xi: rreg dv i.group

i.group           _Igroup_1-3         (naturally coded; _Igroup_1 omitted)
<iterations omitted>
Robust regression estimates                            Number of obs =     100
F(  2,    97) =   17.54
Prob > F      =  0.0000

------------------------------------------------------------------------------
dv |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Igroup_2 |  -3.383854     .62812    -5.39   0.000    -4.630498   -2.137209
_Igroup_3 |  -3.401373   .5875525    -5.79   0.000    -4.567502   -2.235244
_cons |    3.39778   .5439679     6.25   0.000     2.318155    4.477406
------------------------------------------------------------------------------

test _Igroup_2 _Igroup_3

( 1)  _Igroup_2 = 0.0
( 2)  _Igroup_3 = 0.0

F(  2,    97) =   17.54
Prob > F =    0.0000
So far we have seen how we can use simanova for simulating type I error rates for a given condition, but since it uses return values to return the resulting type I error rates, it can also be used for simulation studies to examine the performance of these various tests under different conditions.  Below, we show a simple example where we vary the sample sizes for each of 3 groups between 10 and 40 and likewise vary the standard deviation from 1 to 3 for each group.   We cannot only use simanova for analysis, we can use it for simulation studies by varying Ns and Ss.
postfile simrse n1 n2 n3 s1 s2 s3 fp rsep rrp wstatp fstarp using simrse , replace

foreach n1 of numlist 10 40 {
foreach n2 of numlist 10 40 {
foreach n3 of numlist 10 40 {
foreach s1 of numlist 1 3 {
foreach s2 of numlist 1 3 {
foreach s3 of numlist 1 3 {
simanova , groups(3) n(n1' n2' n3') s(s1' s2' s3') fstar wtest rse rreg /*
nomp(0.05) reps(5000)
post simrse (n1') (n2') (n3') (s1') (s2') (s3') (r(_fp)') (r(rsep)') /*
*/ (r(_rrp)') (r(_wstatp)') (r(_fstarp)')
}
}
}
}
}
}
postclose simrse
We then use the file and create a variable with the standard deviations and sample sizes for the groups, and then display the simulated p values in a table using the tabdisp command.  The first row is p values for the regular F test, then the W test, then the F* test, then regression with robust standard errors, and finally robust regression.
use simrse2, clear
gen str8 s =   string(s1,"%02.0f") + "," + string(s2,"%02.0f") + "," + string(s3,"%02.0f")
gen str8 n =   string(n1,"%02.0f") + "," + string(n2,"%02.0f") + "," + string(n3,"%02.0f")
tabdisp s n, cellvar(fp  wstatp fstar rsep rrp)
While this is a very limited study, it does reveal some very interesting information.
• As we expect, the regular F tests (the first line) are most liberal when the large sample sizes are associated with the small standard deviations.   Conversely, the regular F tests are most conservative when the large sample sizes are associated with the large standard deviations.
• The W and F* tests (lines 2 and 3) are both reasonably robust to the violations of homogeneity of variance studied here.
• The regression with robust standard errors (line 4) also was reasonably robust, although it occasionally had type I error rates as high as 10%. This suggests that when performing an ordinary regression that includes categorical variables as dummy variables that exhibit heterogeneity of variance, the robust option may be useful for increasing robustness with respect to this heterogeneity. Further study in this area would be needed before drawing firmer conclusions.
• The robust regression (line 5) was not robust at all to the violations of homogeneity of variance from this study, and frequently performed much more poorly than the standard ANOVA. This suggests that robust regression may be a poor choice when categorical variables are used that show heterogeneity of variance. Further study in this area would be needed before drawing firmer conclusions.
------------------------------------------------------------------------------------------
|                                       n
s | 10,10,10  10,10,40  10,40,10  10,40,40  40,10,10  40,10,40  40,40,10  40,40,40
----------+-------------------------------------------------------------------------------
01,01,01 |    .0518      .048     .0542     .0476      .049      .051      .048      .052
|    .0466     .0516     .0566      .047     .0522     .0508     .0508     .0546
|    .0492     .0516     .0528     .0468     .0512     .0538     .0496      .052
|    .0612     .0816     .0864     .0638      .083     .0692       .07     .0572
|    .0532     .0514      .055      .046      .051     .0522      .047     .0526
|
01,01,03 |    .0784     .0028      .214     .0254     .2098      .031     .2714      .071
|     .047     .0486     .0548      .049     .0486     .0508     .0558     .0534
|     .065     .0566     .0658      .061      .063     .0652      .072     .0684
|    .0606     .0634     .0958      .063     .0802     .0668      .083     .0578
|    .3098      .013     .5208      .174       .52     .1784     .5528     .3138
|
01,03,01 |      .08     .1952     .0028       .03      .204     .2818     .0268     .0706
|    .0548     .0496     .0508     .0528     .0546     .0518     .0476     .0488
|    .0704     .0606     .0592     .0674     .0652     .0678     .0652     .0652
|    .0708     .0822     .0672     .0726     .0882     .0806     .0654     .0524
|     .336     .5062     .0128      .161     .5098      .551      .171      .317
|
01,03,03 |    .0598     .0268     .0258     .0204     .2488     .0952     .0896     .0554
|    .0492     .0514      .045     .0472     .0486     .0538     .0488     .0458
|     .052     .0614     .0614     .0584      .044     .0552     .0506     .0538
|    .0692     .0724     .0682     .0542     .0928     .0814     .0744     .0504
|    .1762     .0412     .0376      .028     .6388      .279      .288     .1338
|
03,01,01 |     .085     .2074     .2026     .2772     .0012     .0298     .0258      .071
|     .052     .0542     .0506     .0476     .0484     .0526     .0474     .0556
|    .0682       .07     .0626      .059     .0624     .0646     .0614     .0658
|    .0712     .0862      .083     .0708     .0646     .0716     .0636     .0596
|    .3268     .5032     .5062     .5558     .0086      .169     .1724      .311
|
03,01,03 |    .0676     .0266      .254     .0922     .0286     .0202     .0944     .0574
|     .056     .0462     .0546     .0546      .049      .047     .0502     .0504
|     .061     .0598     .0498     .0542     .0594     .0576     .0508     .0556
|    .0806     .0676     .1016     .0794     .0698     .0538      .075      .057
|    .1742      .042     .6372      .283     .0462      .028     .2804     .1306
|
03,03,01 |    .0628     .2534      .035     .0958     .0268     .0914     .0238     .0556
|    .0526     .0532      .056      .048     .0474     .0484      .053     .0522
|    .0566     .0486     .0682     .0538      .057     .0532     .0664     .0542
|    .0702     .0974     .0792     .0752     .0722     .0734      .063     .0558
|    .1724     .6292     .0466     .2786     .0424     .2796     .0294     .1312
|
03,03,03 |    .0538     .0466     .0444     .0522     .0462     .0488     .0546     .0528
|      .05     .0496     .0428     .0544     .0504      .049     .0564     .0508
|    .0512     .0466     .0414     .0508     .0486     .0474      .054     .0526
|    .0652     .0786     .0704     .0698     .0772     .0652     .0756     .0546
|    .0606      .048     .0422     .0556     .0472       .05     .0526     .0514
------------------------------------------------------------------------------------------`

#### References

• Browne, M. B. & Forsythe, A. B. (1974). The ANOVA and multiple comparisons for data with heterogeneous variances, Biometrics, 719-724.
• Wilcox, R. (1987). New Designs in Analysis of Variance. Annual Review of Psychology, 29-60.
• Wilcox, R, Charlin, V, Thompson, K. (1986). Communications in Statistical Simulation and Computation. 15(4) 933-943.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.