UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Textbook Examples
An Introduction to Categorical Analysis by Alan Agresti
Chapter 2 - Two-way Contingency Tables

Inputting the Aspirin data, Table 2.3, p. 20 and calculating the results on p. 21-24.

Note: For 2x2 tables the measure option in the freq procedure provides the confidence intervals for the odds ratio, which is labeled "case-control (odds ratio)" in the output, and the relative risk, which is labeled "cohort (col1 risk)". The proc freq and proc genmod were invoked to show that both procedures can produce a chi-squared test statistic though only the proc freq will perform a chi-square test and give a p-value.

data aspirin;
input group mi count @@;
cards;
1 1 189     1 2 10845  
2 1 104     2 2 10933  
;
run;
proc format;
  value group 1='Placebo' 2='Aspirin';
  value mi 1='Yes' 2='No';
run;
proc freq data=aspirin order=data; 
 format group group. mi mi.;
 weight count;
 tables group*mi / chisq expected measures nopercent norow nocol;
run;
proc genmod data=aspirin;
  format group group. mi mi.;
  model count = group mi / dist=poi link=log obstats residuals;
run;
The FREQ Procedure

Table of group by mi
group     mi

Frequency|
Expected |Yes     |No      |  Total
---------+--------+--------+
Placebo  |    189 |  10845 |  11034
         | 146.48 |  10888 |
---------+--------+--------+
Aspirin  |    104 |  10933 |  11037
         | 146.52 |  10890 |
---------+--------+--------+
Total         293    21778    22071

Statistics for Table of group by mi

Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1     25.0139    <.0001
Likelihood Ratio Chi-Square    1     25.3720    <.0001
Continuity Adj. Chi-Square     1     24.4291    <.0001
Mantel-Haenszel Chi-Square     1     25.0128    <.0001
Phi Coefficient                       0.0337
Contingency Coefficient               0.0336
Cramer's V                            0.0337

       Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)       189
Left-sided Pr <= F          1.0000
Right-sided Pr >= F      3.253E-07

Table Probability (P)    1.516E-07
Two-sided Pr <= P        5.033E-07

The FREQ Procedure

Statistics for Table of group by mi

Statistic                              Value       ASE
------------------------------------------------------
Gamma                                 0.2938    0.0561
Kendall's Tau-b                       0.0337    0.0065
Stuart's Tau-c                        0.0077    0.0015

Somers' D C|R                         0.0077    0.0015
Somers' D R|C                         0.1471    0.0282

Pearson Correlation                   0.0337    0.0065
Spearman Correlation                  0.0337    0.0065

Lambda Asymmetric C|R                 0.0000    0.0000
Lambda Asymmetric R|C                 0.0077    0.0015
Lambda Symmetric                      0.0075    0.0015

Uncertainty Coefficient C|R           0.0081    0.0032
Uncertainty Coefficient R|C           0.0008    0.0003
Uncertainty Coefficient Symmetric     0.0015    0.0006

           Estimates of the Relative Risk (Row1/Row2)

Type of Study                   Value       95% Confidence Limits
-----------------------------------------------------------------
Case-Control (Odds Ratio)      1.8321        1.4400        2.3308
Cohort (Col1 Risk)             1.8178        1.4330        2.3059
Cohort (Col2 Risk)             0.9922        0.9892        0.9953

Sample Size = 22071

The GENMOD Procedure

        Model Information
        
Data Set              WORK.ASPIRIN
Distribution               Poisson
Link Function                  Log
Dependent Variable           count
Observations Used                4

           Criteria For Assessing Goodness Of Fit

Criterion                 DF           Value        Value/DF
Deviance                   1         25.3720         25.3720
Scaled Deviance            1         25.3720         25.3720
Pearson Chi-Square         1         25.0139         25.0139
Scaled Pearson X2          1         25.0139         25.0139
Log Likelihood                   181827.7802

Algorithm converged.
                            Analysis Of Parameter Estimates

                               Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq
Intercept     1      0.6781      0.1188      0.4454      0.9109      32.60        <.0001
group         1      0.0003      0.0135     -0.0261      0.0267       0.00        0.9839
mi            1      4.3085      0.0588      4.1932      4.4238    5366.76        <.0001
Scale         0      1.0000      0.0000      1.0000      1.0000

NOTE: The scale parameter was held fixed.
                                  Observation Statistics

Observation        count        group           mi         Pred        Xbeta          Std
                              HessWgt        Lower        Upper       Resraw       Reschi
                               Resdev     StResdev     StReschi       Reslik
          1          189            1            1    146.48015    4.9868899    0.0588072
                            146.48015     130.5335    164.37492    42.519853     3.513196
                            3.3609942     4.784706    5.0013802    4.8956654
          2        10845            1            2     10887.52    9.2953725    0.0095519
                             10887.52    10685.587    11093.269    -42.51991      -0.4075
                            -0.407766    -5.004648    -5.001387    -5.001409
          3          104            2            1    146.51997    4.9871618     0.058807
                            146.51997    130.56904    164.41955    -42.51997    -3.512728
                            -3.707237    -5.278334    -5.001394    -5.139873
                            
The GENMOD Procedure

                                  Observation Statistics

Observation        count        group           mi         Pred        Xbeta          Std
                              HessWgt        Lower        Upper       Resraw       Reschi
                               Resdev     StResdev     StReschi       Reslik
          4        10933            2            2     10890.48    9.2956443    0.0095506
                             10890.48    10688.519    11096.257    42.519913    0.4074449
                            0.4071802     4.998138    5.0013872    5.0013656
Inputting the Smoking and MI data, table 2.4, p. 26.
data smoking;
input group mi count @@;
cards;
1 1 172     1 2 173  
2 1  90     2 2 346  
;
run;
Calculations of odds ratio, p. 26-27.

Note: The option or in the exact statement is necessary in order to get the odds ratio in the output.

proc format;
  value group 1='Smoker' 2='Non-smoker';
  value mi 1='MI' 2='Control';
run;
proc freq data=smoking order=data; 
 format group group. mi mi.;
 weight count;
 tables group*mi / chisq expected measures nopercent norow nocol;
 exact or;
run;
The FREQ Procedure

Table of group by mi
group       mi

Frequency  |
Expected   |MI      |Control |  Total
-----------+--------+--------+
Smoker     |    172 |    173 |    345
           | 115.74 | 229.26 |
-----------+--------+--------+
Non-smoker |     90 |    346 |    436
           | 146.26 | 289.74 |
-----------+--------+--------+
Total           262      519      781

Statistics for Table of group by mi

Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1     73.7287    <.0001
Likelihood Ratio Chi-Square    1     74.2583    <.0001
Continuity Adj. Chi-Square     1     72.4241    <.0001
Mantel-Haenszel Chi-Square     1     73.6343    <.0001
Phi Coefficient                       0.3073
Contingency Coefficient               0.2937
Cramer's V                            0.3073

       Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)       172
Left-sided Pr <= F          1.0000
Right-sided Pr >= F      6.762E-18

Table Probability (P)    1.888E-17
Two-sided Pr <= P        1.029E-17

The FREQ Procedure

Statistics for Table of group by mi

Statistic                              Value       ASE
------------------------------------------------------
Gamma                                 0.5853    0.0526
Kendall's Tau-b                       0.3073    0.0343
Stuart's Tau-c                        0.2882    0.0328

Somers' D C|R                         0.2921    0.0332
Somers' D R|C                         0.3232    0.0359

Pearson Correlation                   0.3073    0.0343
Spearman Correlation                  0.3073    0.0343

Lambda Asymmetric C|R                 0.0000    0.0000
Lambda Asymmetric R|C                 0.2377    0.0410
Lambda Symmetric                      0.1351    0.0239

Uncertainty Coefficient C|R           0.0745    0.0168
Uncertainty Coefficient R|C           0.0693    0.0157
Uncertainty Coefficient Symmetric     0.0718    0.0162

           Estimates of the Relative Risk (Row1/Row2)

Type of Study                   Value       95% Confidence Limits
-----------------------------------------------------------------
Case-Control (Odds Ratio)      3.8222        2.7934        5.2299
Cohort (Col1 Risk)             2.4152        1.9532        2.9864
Cohort (Col2 Risk)             0.6319        0.5629        0.7093

  Odds Ratio (Case-Control Study)
-----------------------------------
Odds Ratio                   3.8222

Asymptotic Conf Limits
95% Lower Conf Limit         2.7934
95% Upper Conf Limit         5.2299

Exact Conf Limits
95% Lower Conf Limit         2.7607
95% Upper Conf Limit         5.2984

Sample Size = 781
Inputting the General Social Survey data, table 2.5, p. 31.
data Survey;
input Gender Party count @@;
cards;
1 1 279     1 2  73    1 3 225  
2 1 165     2 2  47    2 3 191
;
run; 
Chi-square test of independence, p. 31.
proc format;
  value gender 1='female' 2='Male';
  value party 1='Democrat' 2='Independent' 3='Republican';
run;
proc freq data=survey order=data; 
 format gender gender. party party.;
 weight count;
 tables gender*party / chisq expected nopercent norow nocol;
run;
The FREQ Procedure

Table of Gender by Party
Gender     Party

Frequency|
Expected |Democrat|Independ|Republic|  Total
         |        |ent     |an      |
---------+--------+--------+--------+
female   |    279 |     73 |    225 |    577
         | 261.42 | 70.653 | 244.93 |
---------+--------+--------+--------+
Male     |    165 |     47 |    191 |    403
         | 182.58 | 49.347 | 171.07 |
---------+--------+--------+--------+
Total         444      120      416      980

Statistics for Table of Gender by Party
Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     2      7.0095    0.0301
Likelihood Ratio Chi-Square    2      7.0026    0.0302
Mantel-Haenszel Chi-Square     1      6.7581    0.0093
Phi Coefficient                       0.0846
Contingency Coefficient               0.0843
Cramer's V                            0.0846

Sample Size = 980
Creating 2x2 tables from the Survey data, p. 33.

Note: G2 is the likelihood ratio chi-square in the output.

data DemoInd;
input Gender Party count @@;
cards;
1 1 279     1 2  73 
2 1 165     2 2  47 
;
run; 
 
data Collapse;
input Gender combo count @@;
cards;
1 1 352      1 2 225  
2 1 212      2 2 191
;
run; 
proc format;
  value combo 1='Demo./Ind.' 2='Republican';
run;
 
proc freq data=DemoInd order=data; 
 format gender gender. party party.;
 weight count;
 tables gender*party / chisq expected nopercent norow nocol;
run;
proc freq data=collapse order=data; 
 format gender gender. combo combo.;
 weight count;
 tables gender*combo / chisq expected nopercent norow nocol;
run;
The FREQ Procedure

Table of Gender by Party
Gender     Party

Frequency|
Expected |Democrat|Independ|  Total
         |        |ent     |
---------+--------+--------+
female   |    279 |     73 |    352
         | 277.11 | 74.894 |
---------+--------+--------+
Male     |    165 |     47 |    212
         | 166.89 | 45.106 |
---------+--------+--------+
Total         444      120      564

Statistics for Table of Gender by Party
Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1      0.1618    0.6875
Likelihood Ratio Chi-Square    1      0.1612    0.6881
Continuity Adj. Chi-Square     1      0.0876    0.7672
Mantel-Haenszel Chi-Square     1      0.1615    0.6878
Phi Coefficient                       0.0169
Contingency Coefficient               0.0169
Cramer's V                            0.0169

       Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)       279
Left-sided Pr <= F          0.6957
Right-sided Pr >= F         0.3819

Table Probability (P)       0.0776
Two-sided Pr <= P           0.7501

Sample Size = 564

The FREQ Procedure

Table of Gender by combo
Gender     combo

Frequency|
Expected |Demo./In|Republic|  Total
         |d.      |an      |
---------+--------+--------+
female   |    352 |    225 |    577
         | 332.07 | 244.93 |
---------+--------+--------+
Male     |    212 |    191 |    403
         | 231.93 | 171.07 |
---------+--------+--------+
Total         564      416      980

Statistics for Table of Gender by combo
Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1      6.8528    0.0089
Likelihood Ratio Chi-Square    1      6.8414    0.0089
Continuity Adj. Chi-Square     1      6.5133    0.0107
Mantel-Haenszel Chi-Square     1      6.8458    0.0089
Phi Coefficient                       0.0836
Contingency Coefficient               0.0833
Cramer's V                            0.0836
       Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)       352
Left-sided Pr <= F          0.9963
Right-sided Pr >= F         0.0054

Table Probability (P)       0.0017
Two-sided Pr <= P           0.0104

Sample Size = 980
Inputting the Infants data, table 2.7, p. 35.
data infants;
input malform alcohol count @@;
cards;
1   0 17066     2   0  48
1 0.5 14464     2 0.5  38
1 1.5   788     2 1.5   5 
1 4.0   126     2 4.0   1
1 7.0    37     2 7.0   1
;
run; 
Results, p. 36. The G2 statistic is the Likelihood ratio Chi-square in the first table of the output and the X2 statistic is the Chi-square in the same table. The sample correlation, r, is in the table labeled Pearson Correlation Coefficient. The M2 statistic is the statistic in the last table in the output labeled Cochran-Mantel-Haenszel Statistics.
proc format;
  value malform 2='Present' 1='Absent';
  value Alcohol 0='0' 0.5='<1' 1.5='1-2' 4.0='3-5' 7.0='>=6';
run;
proc freq data = infants; 
 format malform malform. alcohol alcohol.;
 weight count;
 tables alcohol*malform / chisq cmh1 norow nocol nopercent;
 test pcorr ;
run;
The FREQ Procedure

Table of alcohol by malform
alcohol     malform

Frequency|Absent  |Present |  Total
---------+--------+--------+
0        |  17066 |     48 |  17114
---------+--------+--------+
<1       |  14464 |     38 |  14502
---------+--------+--------+
1-2      |    788 |      5 |    793
---------+--------+--------+
3-5      |    126 |      1 |    127
---------+--------+--------+
>=6      |     37 |      1 |     38
---------+--------+--------+
Total       32481       93    32574

Statistics for Table of alcohol by malform
Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     4     12.0821    0.0168
Likelihood Ratio Chi-Square    4      6.2020    0.1846
Mantel-Haenszel Chi-Square     1      6.5699    0.0104
Phi Coefficient                       0.0193
Contingency Coefficient               0.0193
Cramer's V                            0.0193

WARNING: 30% of the cells have expected counts less
         than 5. Chi-Square may not be a valid test.

The FREQ Procedure

Statistics for Table of alcohol by malform

Statistic                              Value       ASE
------------------------------------------------------
Gamma                                 0.0571    0.1010
Kendall's Tau-b                       0.0032    0.0058
Stuart's Tau-c                        0.0004    0.0006

Somers' D C|R                         0.0003    0.0006
Somers' D R|C                         0.0311    0.0556

Pearson Correlation                   0.0142    0.0106
Spearman Correlation                  0.0033    0.0059

Lambda Asymmetric C|R                 0.0000    0.0000
Lambda Asymmetric R|C                 0.0000    0.0000
Lambda Symmetric                      0.0000    0.0000

Uncertainty Coefficient C|R           0.0049    0.0048
Uncertainty Coefficient R|C           0.0001    0.0001
Uncertainty Coefficient Symmetric     0.0002    0.0002

Pearson Correlation Coefficient
--------------------------------
Correlation               0.0142
ASE                       0.0106
95% Lower Conf Limit     -0.0066
95% Upper Conf Limit      0.0350

  Test of H0: Correlation = 0
ASE under H0              0.0107
Z                         1.3226
One-sided Pr >  Z         0.0930
Two-sided Pr > |Z|        0.1860

Sample Size = 32574

The FREQ Procedure

Summary Statistics for alcohol by malform
  Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic    Alternative Hypothesis    DF       Value      Prob
---------------------------------------------------------------
   1        Nonzero Correlation        1      6.5699    0.0104

Total Sample Size = 32574
Equally spaced row scores for the Infants data set, p. 37.
data  infantsx;
input malform alcoholx count @@;
cards;
1   0 17066     2   0  48
1   1 14464     2   1  38
1   2   788     2   2   5 
1   3   126     2   3   1
1   4    37     2   4   1
;
run; 
proc freq data = infantsx; 
 weight count;
 tables alcoholx*malform / cmh1 norow nocol nopercent;
run;
The FREQ Procedure

Table of alcoholx by malform
alcoholx     malform

Frequency|       1|       2|  Total
---------+--------+--------+
       0 |  17066 |     48 |  17114
---------+--------+--------+
       1 |  14464 |     38 |  14502
---------+--------+--------+
       2 |    788 |      5 |    793
---------+--------+--------+
       3 |    126 |      1 |    127
---------+--------+--------+
       4 |     37 |      1 |     38
---------+--------+--------+
Total       32481       93    32574

Summary Statistics for alcoholx by malform
  Cochran-Mantel-Haenszel Statistics (Based on Table Scores)

Statistic    Alternative Hypothesis    DF       Value      Prob
---------------------------------------------------------------
    1        Nonzero Correlation        1      1.8278    0.1764

Total Sample Size = 32574
Using the midrank scores, p. 37. It is the option scores=ridit that tells SAS to use midrank scores.
proc format;
  value malform 2='Present' 1='Absent';
  value Alcohol 0='0' 0.5='<1' 1.5='1-2' 4.0='3-5' 7.0='>=6';
run;
proc freq data=infants; 
 format malform malform. alcohol alcohol.;
 weight count;
 tables alcohol*malform / cmh1 scores=ridit norow nocol nopercent;
 test pcorr;
run;
The FREQ Procedure

Table of alcohol by malform
alcohol     malform

Frequency|Absent  |Present |  Total
---------+--------+--------+
0        |  17066 |     48 |  17114
---------+--------+--------+
<1       |  14464 |     38 |  14502
---------+--------+--------+
1-2      |    788 |      5 |    793
---------+--------+--------+
3-5      |    126 |      1 |    127
---------+--------+--------+
>=6      |     37 |      1 |     38
---------+--------+--------+
Total       32481       93    32574

Statistics for Table of alcohol by malform
Statistic                              Value       ASE
------------------------------------------------------
Gamma                                 0.0571    0.1010
Kendall's Tau-b                       0.0032    0.0058
Stuart's Tau-c                        0.0004    0.0006

Somers' D C|R                         0.0003    0.0006
Somers' D R|C                         0.0311    0.0556

Pearson Correlation (Ridit Scores)    0.0033    0.0059
Spearman Correlation                  0.0033    0.0059

Lambda Asymmetric C|R                 0.0000    0.0000
Lambda Asymmetric R|C                 0.0000    0.0000
Lambda Symmetric                      0.0000    0.0000

Uncertainty Coefficient C|R           0.0049    0.0048
Uncertainty Coefficient R|C           0.0001    0.0001
Uncertainty Coefficient Symmetric     0.0002    0.0002

The FREQ Procedure

Statistics for Table of alcohol by malform

Pearson Correlation Coefficient
         (Ridit Scores)
--------------------------------
Correlation               0.0033
ASE                       0.0059
95% Lower Conf Limit     -0.0082
95% Upper Conf Limit      0.0148

  Test of H0: Correlation = 0
ASE under H0              0.0059
Z                         0.5583
One-sided Pr >  Z         0.2883
Two-sided Pr > |Z|        0.5766

Sample Size = 32574

Summary Statistics for alcohol by malform

  Cochran-Mantel-Haenszel Statistics (Based on Ridit Scores)

Statistic    Alternative Hypothesis    DF       Value      Prob
---------------------------------------------------------------
    1        Nonzero Correlation        1      0.3514    0.5533

Total Sample Size = 32574
Inputting Tea-Tasting Experiment data, table 2.8, p. 40.

Calculating exact test, p-values in table 2.9, p. 41.

Note: For tables having small cell counts (n < 5), the Exact option performs an exact test of independence that treats the variable as nominal. For 2x2 tables this is Fisher's exact test.

Note2: The p-value and chi-square test statistic was only calculated for the data in table 2.8 not for all the possibilities for n11 as seen in table 2.9, p. 41.

data tea;
 input poured guess count @@; 
 cards; 
1  1  3      1  2  1  
2  1  1      2  2  3 
; 
proc freq data=tea; 
 weight count; 
 tables poured*guess / exact; 
run;
The FREQ Procedure

Table of poured by guess
poured     guess

Frequency|
Percent  |
Row Pct  |
Col Pct  |       1|       2|  Total
---------+--------+--------+
       1 |      3 |      1 |      4
         |  37.50 |  12.50 |  50.00
         |  75.00 |  25.00 |
         |  75.00 |  25.00 |
---------+--------+--------+
       2 |      1 |      3 |      4
         |  12.50 |  37.50 |  50.00
         |  25.00 |  75.00 |
         |  25.00 |  75.00 |
---------+--------+--------+
Total           4        4        8
            50.00    50.00   100.00

Statistics for Table of poured by guess
Statistic                     DF       Value      Prob
------------------------------------------------------
Chi-Square                     1      2.0000    0.1573
Likelihood Ratio Chi-Square    1      2.0930    0.1480
Continuity Adj. Chi-Square     1      0.5000    0.4795
Mantel-Haenszel Chi-Square     1      1.7500    0.1859
Phi Coefficient                       0.5000
Contingency Coefficient               0.4472
Cramer's V                            0.5000

WARNING: 100% of the cells have expected counts less
         than 5. Chi-Square may not be a valid test.
       Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F)         3
Left-sided Pr <= F          0.9857
Right-sided Pr >= F         0.2429

Table Probability (P)       0.2286
Two-sided Pr <= P           0.4857

Sample Size = 8

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.