Stat Computing > SAS > FAQ
Help the Stat Consulting Group by giving a gift             
Loading

SAS FAQ
How do I compute tetrachoric/polychoric correlations in SAS?

When two random variables under consideration are dichotomous variables or ordinal categorical variables, we might need to compute the tetrachoric/polychoric correlations. The calculation of tetrachoric/polychoric correlation is under the assumption that the two dichotomous variables represent underlying normal distributions. When both variables are binary, the correlation is called tetrachoric correlation and in a more general case it is called polychoric correlation. In SAS, proc freq is used to obtain tetrachoric/polychoric correlation.

In the following examples, we will use data set hsb2.sas7bdat.  

Example 1: Computing tetrachoric correlation between two dichotomous variables

We specify the plcorr option in the tables statement to request for polychoric correlation. The two variables of interest are female and honors (= write>=60) which is created in the data step below.

data hsb2;
  set ats.hsb2;
  honors = (write>=60);
run;
proc freq data = hsb2;
  tables honors*female /plcorr;
run;
Table of honors by female

honors     female

Frequency|
Percent  |
Row Pct  |
Col Pct  |       0|       1|  Total
---------+--------+--------+
       0 |     73 |     74 |    147
         |  36.50 |  37.00 |  73.50
         |  49.66 |  50.34 |
         |  80.22 |  67.89 |
---------+--------+--------+
       1 |     18 |     35 |     53
         |   9.00 |  17.50 |  26.50
         |  33.96 |  66.04 |
         |  19.78 |  32.11 |
---------+--------+--------+
Total          91      109      200
            45.50    54.50   100.00

Statistic                              Value       ASE
------------------------------------------------------
Gamma                                 0.3146    0.1503
Kendall's Tau-b                       0.1391    0.0684
Stuart's Tau-c                        0.1223    0.0607

Somers' D C|R                         0.1570    0.0770
Somers' D R|C                         0.1233    0.0612

Pearson Correlation                   0.1391    0.0684
Spearman Correlation                  0.1391    0.0684
Tetrachoric Correlation               0.2362    0.1156

Lambda Asymmetric C|R                 0.0000    0.0000
Lambda Asymmetric R|C                 0.0000    0.0000
Lambda Symmetric                      0.0000    0.0000

Uncertainty Coefficient C|R           0.0143    0.0142
Uncertainty Coefficient R|C           0.0170    0.0169
Uncertainty Coefficient Symmetric     0.0155    0.0154

           Estimates of the Relative Risk (Row1/Row2)

Type of Study                   Value       95% Confidence Limits
-----------------------------------------------------------------
Case-Control (Odds Ratio)      1.9182        0.9974        3.6890
Cohort (Col1 Risk)             1.4622        0.9712        2.2015
Cohort (Col2 Risk)             0.7623        0.5930        0.9799

Sample Size = 200

Example 2: Computing polychoric correlation among two or more ordinal categorical variables

We will use SAS ODS to output the polychoric correlation to a data set. SAS can produce a number of output data sets based on the output from a procedure using ODS (Output Delivery System). Tetrachoric and polychoric correlations are in the data set called measures since SAS put it with all other measures of associations together. We can subset it to only contain tetrachoric and polychoric correlations using the where statement in the process of creating this data set.

proc freq data = hsb2;
  tables (female ses honors)*(female ses honors) /plcorr;
  ods output measures=mycorr (where=(statistic="Tetrachoric Correlation"
                                     or statistic="Polychoric Correlation")
                              keep = statistic table value);
run;
proc print data = mycorr;
run;
Obs            Table                   Statistic              Value

 1     Table female * female    Tetrachoric Correlation      1.0000
 2     Table ses * female       Polychoric Correlation      -0.1741
 3     Table honors * female    Tetrachoric Correlation      0.2362
 4     Table female * ses       Polychoric Correlation      -0.1741
 5     Table ses * ses          Polychoric Correlation       1.0000
 6     Table honors * ses       Polychoric Correlation       0.2769
 7     Table female * honors    Tetrachoric Correlation      0.2362
 8     Table ses * honors       Polychoric Correlation       0.2769
 9     Table honors * honors    Tetrachoric Correlation      1.0000

Example 3: Obtaining a polychoric correlation matrix for a group of variables

The example above shows how to obtain polychoric correlations for multiple variables. But the output is not in matrix format and this can be a problem if further analysis is to be performed using the correlation matrix. In this example, we show some data steps to convert the output into a data set  of correlation matrix type. In the data step below, we created three variables, group, x and y. Since there are three variables, the correlation matrix will have three rows and three columns. This is what the group variable is going to be used for. Each correlation involves two variables, the name of the first variable is stored in variable x and the second one in y.  

proc freq data = hsb2;
  tables (female ses honors)*(female ses honors) /plcorr;
  ods output measures=mycorr (where=(statistic="Tetrachoric Correlation"
                                     or statistic="Polychoric Correlation")
                              keep = statistic table value);
run;

data mycorrt;
  set mycorr ;
  group = floor((_n_ - 1)/3);
  x = scan(table, 2, " *");
  y = scan(table, 3, " *");
   keep group value table x y;
run;
proc print data = mycorrt;
run;
Obs            Table               Value    group    x         y

 1     Table female * female      1.0000      0      female    female
 2     Table ses * female        -0.1741      0      ses       female
 3     Table honors * female      0.2362      0      honors    female
 4     Table female * ses        -0.1741      1      female    ses
 5     Table ses * ses            1.0000      1      ses       ses
 6     Table honors * ses         0.2769      1      honors    ses
 7     Table female * honors      0.2362      2      female    honors
 8     Table ses * honors         0.2769      2      ses       honors
 9     Table honors * honors      1.0000      2      honors    honors


Now we are ready to transpose the data set up to a matrix format.

proc transpose data = mycorrt out=mymatrix (drop = _name_ group)   ;
   id x;
   by group;
   var value ;
run;
proc print data = mymatrix;
run;
Obs      female         ses      honors

 1       1.0000     -0.1741      0.2362
 2      -0.1741      1.0000      0.2769
 3       0.2362      0.2769      1.0000


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.