Stata FAQ
How can I create all pairs within groups?

Consider the following dataset.
clear
input year str3 b bx
2001 abc  32
2001 nbc  14
2001 cbs  23
2001 nbc  14
2002 abc  11
2002 nbc  21
2002 wb   19
2002 pbs  30
2003 abc  25
2003 nbc  16
2003 nbc  16
2003 cbs  19
2003 pbs  28
2003 abc  25
end
In this example year is the group variable. What we want to do is to create a dataset that contains all pairs of the variable b within each year. For example, for the year 2001, we want observations that pair nbc with abc and cbs. However, we do not want nbc to be paired with itself. Further, the pair nbc with cbs is considered to be a duplicate permutation of cbs with nbc. We want to keep only one of each duplicate permutation. Hopefully, the data for 2001 will look something like this,
2001 abc cbs 32 23
2001 abc nbc 32 14
2001 cbs nbc 23 14
There is a Stata FAQ page written by Nick Cox (http://www.stata.com/support/faqs/data/pairs.html) that takes a different approach to solving this problem.

Our approach will be to make use of the joinby command. To accomplish this we will have to save the data into a temporary file and change the variable names before using the joinby command.

Here is the code fragment that will create all the possible pairs without duplicate permutations.

/* find and remove duplicates within year */
by year b, sort: gen i=_n
keep if i==1
drop i
tempfile temp1
save `temp1', replace

/* create all possible pairs using joinby */
rename b c                  /* rename prior to joinby          */
rename bx cx                /* rename prior to joinby          */
joinby year using `temp1'
drop if b==c                /* drop if b and c are the same    */
egen d1=concat(b c)         /* create joint pairs in order 1   */
egen d2=concat(c b)         /* create joint pairs in order 2   */
replace d1=d2 if b>c        /* d1 has pairs in ascending order */

/* find and remove duplicate permutations within year */
by year d1, sort: gen i=_n  
keep if i==1

/* clean up variables an display */
drop d2
drop i
order year b c bx cx
clist
Running the above code yields the following output.
          year          b          c         bx         cx         d1
  1.      2001        cbs        abc         23         32     abccbs
  2.      2001        nbc        abc         14         32     abcnbc
  3.      2001        cbs        nbc         23         14     cbsnbc
  4.      2002        abc        nbc         11         21     abcnbc
  5.      2002        pbs        abc         30         11     abcpbs
  6.      2002        abc         wb         11         19      abcwb
  7.      2002        nbc        pbs         21         30     nbcpbs
  8.      2002         wb        nbc         19         21      nbcwb
  9.      2002        pbs         wb         30         19      pbswb
 10.      2003        abc        cbs         25         19     abccbs
 11.      2003        abc        nbc         25         16     abcnbc
 12.      2003        pbs        abc         28         25     abcpbs
 13.      2003        nbc        cbs         16         19     cbsnbc
 14.      2003        cbs        pbs         19         28     cbspbs
 15.      2003        nbc        pbs         16         28     nbcpbs
In the above output, bx is the x value associated with b and cx is the x value associated with c. Variable d1 contains the concatenated pair in asscending order and was used to find duplicate permutations. This approach can be extended to any number of variables associated with each of the paired values.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.