THE MANTEL-HAENSZEL STATISTIC FOR 2x2xK TABLES

                          David P. Nichols
                     Senior Support Statistician
                             SPSS, Inc.
                 From SPSS Keywords, Volume 54, 1994

One of the more common applications in statistical analysis is to assess
the degree of relationship of two variables while controlling for one or
more nuisance or control variables. A particular situation that has
received a good deal of attention, particularly in the medical research
community, but lately also in numerous other areas such as psychometrics,
is that of a relationship between two dichotomous variables controlling
for one or more categorical factors. When there is only one control
variable or when the 2x2 relationship is examined within each combination
of the levels of the control variables, the result is a 2x2xK cross
classification, where the K levels of the control variable or variable
combinations are often referred to as strata.

A case of interest to researchers in many areas occurs when there is no
three way interaction present in the 2x2xK layout. In this case a question
of common interest is whether there is any relationship between the two
main variables of interest after controlling for the stratification variable.
An example might be the relationship between administration of a drug and
remediation of disease effects controlling for gender of patient, mode of
administration, and/or other factors. A common way to assess relationships
in 2x2 way tables is through the odds ratio. In our drug example if one
group is given a drug and another group a placebo, then all patients are
assessed for recovery, the odds ratio measures the increase (or decrease)
in odds of recovery for patients given the active drug relative to those
given the placebo. An odds ratio of 1 represents no effect, while a ratio
greater than 1 indicates that the drug increases the odds of recovery and
a ratio less than 1 indicates that it dimishes the odds of recovery.

In the SPSS CROSSTABS procedure, this odds ratio can be obtained for a 2x2
table as the Case Control Relative Risk estimate. However, this measure is
given only separately for each 2x2 table. The most popular estimator of the
common odds ratio across the K strata was suggested by Mantel and Haenszel,
who also provided a chi-squared test of the null hypothesis that this common
odds ratio is 1. The common odds ratio estimate is given at the top of page
236 of Alan Agresti's _Categorical Data Analysis_, while the test statistic
commonly referred to as the Mantel-Haenszel chi-squared test for 2x2xK tables
is given by equation 7.8 on page 231 (Agresti refers to it as the Cochran-
Mantel-Haenszel statistic).

The SPSS macro introduced here provides an easy way to produce both of these
quantities, along with the significance level for the test statistic (the
statistic is approximately distributed as a single degree of freedom
chi-squared random variable under the null hypothesis). The source code for
the macro in SPSS syntax is given below, and is also available from SPSS
via anonymous ftp to spss.com or through the SPSS forum on Compuserve. It
is named mh.sps.

Figure 1
The MH.SPS Macro Code
-----------------------------------------------------------------------------
preserve
set printback=off mprint=off
define mh (!positional !tokens(1)
          /!positional !tokens(1)
          /!positional !tokens(1))
preserve 
set printback=off mprint=off
save outfile='mh__tmp1.sav'
autorecode !1 !2 /into row__var col__var
aggregate outfile=*
 /break=row__var col__var !3
 /n__=n
numeric a__ b__ c__ d__
vector cell=a__ to d__
compute index=(row__var-1)*2+col__var
compute cell(index)=n__
aggregate outfile=*
 /break !3
 /a__ b__ c__ d__=max(a__ b__ c__ d__)
recode all (sysmis=0)
compute a=a__
compute b=b__
compute c=c__
compute d=d__
compute m=((a+b)*(a+c))/sum(a to d)
compute v=((a+b)*(c+d)*(a+c)*(b+d))/
          ((sum(a to d)**2)*(sum(a to d)-1))
compute o1=(a*d)/sum(a to d)
compute o2=(b*c)/sum(a to d)
compute constant=1
aggregate outfile=*
 /break=constant
 /sumn=sum(a)
 /summ=sum(m)
 /sumv=sum(v)
 /sumo1=sum(o1)
 /sumo2=sum(o2)
compute odds=sumo1/sumo2
compute mhs=((abs(sumn-summ)-.5)**2)/sumv
compute sig=2*(1-cdfnorm(sqrt(mhs)))
formats odds(f10.5) mhs(f8.4) sig(f6.5)
variable labels odds 'Odds Ratio'
                mhs 'Mantel-Haenszel Statistic'
                sig 'Significance'
report format=list automatic align(center)
 /variables=odds mhs sig
 /title "Estimate of Common Odds Ratio and " +
        "Mantel-Haenszel Statistic"
get file='mh__tmp1.sav'
restore
!enddefine
restore
-----------------------------------------------------------------------------
End Figure 1

The mh.sps macro is most easily used by simply having it resident as a text
file in your working directory and executing the following SPSS syntax:

INCLUDE MH.SPS.
MH rowvar colvar stratvar.

where rowvar is the name of one of the two primary variables, colvar is the
other primary variable and stratvar is the stratification variable. As usual,
a working data file must be defined when invoking the macro. The data may be
either individual cases or weighted aggregated data, just as with most SPSS
procedures. 

The macro first saves your working data file to a file named mh__tmp1.sav.
The double underscore in the file name is an attempt to render unlikely the
overwriting of an existing file. The same convention has been used later in
the macro when creating new variables in an attempt to avoid duplicating
existing variable names. Such duplication would cause the macro to fail. The
SET commands are used to minimize output; they may be changed or removed if
you have problems running the macro in order to aid in identification of
problem sources.

The macro should function on any SPSS release offering macro support. It
uses commands and procedures from the Base System exclusively, so it does
not require the Advanced Statistics module, as do the official SPSS macros
released with Windows versions of SPSS. It consists essentially of some
data manipulation, some basic calculations and a reporting of results. The
row and column variables can be either numeric or string variables (as they
are handled using AUTORECODE). Each variable should assume only two unique
values in the data (since they define a 2x2 cross classification). More
than two distinct values on either or both of these variables will cause the
macro to fail. The stratification variable can also be either string or
numeric; each unique value defines a stratum.

Let's look at an example. The following data were compiled from the _Amnesty
International 1990 Report_. The variables forming the 2x2 cross classification
are 0-1 variables indicating whether or not a government has ratified an
international human rights agreement (1=Yes, 0=No). ICCPR is the International
Covenant on Civil and Political Rights. CAT is the Convention against Torture
and Other Cruel, Inhuman or Degrading Treatment or Punishment. AREA is the
geographic location of the state (1=Africa and the Middle East, 2=The Americas,
3=Australasia and the Pacific, 4=Europe). The COUNT variable gives the number
of observations for each combination of the AREA, ICCPR and CAT variables,
which allows us to represent 168 observations with only 16 lines of data. 
Ratification status is as of December 31, 1989.

Figure 2
Human Rights Data
-----------------------------------------------------------------------------
AREA ICCPR CAT COUNT

  1    1    1     8
  1    1    0    21
  1    0    1     2
  1    0    0    35
  2    1    1    11
  2    1    0    10
  2    0    1     2
  2    0    0    13
  3    1    1     4
  3    1    0     7
  3    0    1     1
  3    0    0    22
  4    1    1    19
  4    1    0     7
  4    0    1     2
  4    0    0     4
-----------------------------------------------------------------------------
End Figure 2

Note that strictly speaking we have no need for inferential statistics with
these data, as they represent the entire population of governments. However,
for some purposes it may be of interest to treat this group of governments
as a sample from a population of potential governments. The substantive
questions of interest for our purposes are whether the odds of ratifying
one agreement given ratification of the other vs. no ratification of the
other are constant across geographical areas, and assuming this to be true,
what is the common odds ratio? The first step in this analysis is to test
the null hypothesis of no three way interaction. This was done using the
LOGLINEAR procedure, fitting a model excluding only the three way term. The
likelihood ratio chi-squared value was .32 on 3 degrees of freedom, with a
significance of .96, indicating that the three way interaction term is not
needed. Removal of any of the two way interaction terms would produce an
unacceptably large increase in the likehihood ratio statistic. Thus our
data appear to exhibit exactly the type of structure for which use of the
Mantel-Haenszel common odds ratio estimate and chi-squared test are useful.

Figure 3 contains the relevant output from the mh.sps macro. The odds ratio
of 7.18 bears out the commonsense assumption that governments having ratified
the ICCPR agreement are more likely to have ratified the CAT agreement than
are governments not having ratified the ICCPR agreement. The large value of 
the Mantel-Haenszel statistic is very unlikely to occur in random samples 
from populations with odds ratios of 1.

Figure 3
Statistical Output from MH.SPS Macro
-----------------------------------------------------------------------------

          Estimate of Common Odds Ratio and Mantel-Haenszel Statistic

                               Mantel-Haenszel
                 Odds Ratio       Statistic       Significance
                 __________    _______________    ____________

                    7.18023         18.6741          .00002

-----------------------------------------------------------------------------
End Figure 3

Note that the macro uses AUTORECODE to process the variable values for the
row and column variables, and effectively constructs the 2x2 tables so that
the low/low and high/high combinations are on the main (top left to lower
right) diagonal. If the variable codings are constructed such that the lower
value means something different on one variable than it does on the other
(such as 0=No, 1=Yes for the row variable and 1=Yes and 2=No on the column
variable), then the odds ratio will be the reciprocal of what consistent
coding produces. The Mantel-Haenszel statistic and significance will not
be affected.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.