UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

SAS Macros: corr2data

The SAS macro corr2data can be used to generate a dataset of a given size with a given correlation/covariance structure. This can be a very useful step in a simulation process.  The macro program can be found here.

If you have already downloaded the macro, you can paste the code into the program editor or, alternatively, use %include

Example 1: Using a correlation matrix from an existing dataset.

First, We have a dataset, auto.sas7bdat, from which we will calculate a correlation matrix and then, using the corr2data macro, generate a new dataset with the same correlation structure.  To use the macro, we need to generate and save the correlation matrix.  Let's look at the correlations between the variables price, mpg, and weight.

proc corr data = auto outp=p  nosimple noprob;
  var price mpg weight;
run;

data corr;
  set p;
  if _type_="CORR";
run;

proc print data = corr;
run;

Obs    _TYPE_    _NAME_      PRICE        MPG       WEIGHT

 1      CORR     PRICE      1.00000    -0.46860     0.53861
 2      CORR     MPG       -0.46860     1.00000    -0.80717
 3      CORR     WEIGHT     0.53861    -0.80717     1.00000

With the dataset corr, we can now run corr2data. To figure out what arguments to provide, we can look at the comments explaining the macro.

/******************************************************************
*  Name: corr2data                                                *
*  Function: creating a data set with given correlation matrix    *
* %corr2data(mydata, corrmat=corr, n=200, full='f', corr='f');    *
*  corrmat: input matrix                                          *
*  n:       number of observations                                *
*  full:    specifying if the input matrix is a full matrix       *
*           'T' for full matrix                                   *
*           'F' for upper or lower triangular                     *
*  corr:    specifying if the input matrix is a correlation       *
*           matrix or a covariance matrix:                        *
*           'T' for correlation matrix and                        *
*           'F' for covariance matrix                             *
*******************************************************************/

We can create a new dataset called mycorr, pass the macro our current correlation matrix corr, specify that our new dataset should have 200 observations and that our matrix is a full matrix of correlations (as opposed to covariances).  The code to do this follows:

%corr2data(mycorr, corr, 200, FULL='T', corr='T');

After running the macro, we can look at the correlations in our new dataset mycorr.

proc corr data = mycorr;
run;

The CORR Procedure

   3  Variables:    COL1     COL2     COL3


                              Simple Statistics

Variable         N        Mean     Std Dev         Sum     Minimum     Maximum

COL1           200           0     1.00000           0    -2.45292     2.59209
COL2           200           0     1.00000           0    -3.12547     2.69587
COL3           200           0     1.00000           0    -3.13046     2.70254


  Pearson Correlation Coefficients, N = 200
          Prob > |r| under H0: Rho=0

              COL1          COL2          COL3

COL1       1.00000      -0.46860       0.53861
                          <.0001        <.0001

COL2      -0.46860       1.00000      -0.80717
            <.0001                      <.0001

COL3       0.53861      -0.80717       1.00000
            <.0001        <.0001

We can see that the correlations here exactly match those from the auto dataset we started with.

Example 2: Writing correlation matrix to create dataset.

You do not necessarily need to start with an existing dataset to generate a dataset with a certain correlation structure.  Instead, you can write a correlation matrix in SAS and provide that matrix to the corr2data macro.  See the example below.

data corr;
  input x1 x2;
datalines;
1 .24
.24 1
;
run;

proc print data = corr; run;

Obs     x1      x2
 1     1.00    0.24
 2     0.24    1.00

Now, this correlation matrix can be our corrmat argument.

%corr2data(mycorr, corr, 200, FULL='T', corr='T');

We can now look at the correlation matrix of our new dataset to see that it matches the correlation matrix we provided.

proc corr data = mycorr; 
run;

The CORR Procedure
   2  Variables:    COL1     COL2

                                    Simple Statistics
Variable           N          Mean       Std Dev           Sum       Minimum       Maximum
COL1             200             0       1.00000             0      -3.15650       2.84648
COL2             200             0       1.00000             0      -2.55106       3.06611

Pearson Correlation Coefficients, N = 200
        Prob > |r| under H0: Rho=0

              COL1          COL2
COL1       1.00000       0.24000
                          0.0006

COL2       0.24000       1.00000
            0.0006

How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.