R FAQ: How can I generate a Venn diagram in R?

Venn diagrams are a very commonly used graphing technique that illustrates levels of overlap between groups in data. They can be created in R using code written as part of the Bioconductor Project. We are following the directions supplied here for installing a package for linear models for microarray data (limma). To use this code, we will use the source command in R to access the code.


source("http://www.bioconductor.org/biocLite.R")

After running this, we can examine what has been added to our workspace and investigate the nature of these objects.

ls()

[1] "biocinstall"             "biocinstallPkgGroups"   
[3] "biocinstallRepos"        "biocLite"               
[5] "getBioC"                 "sourceBiocinstallScript"

class(biocLite)

[1] "function"

The next step in the installation is a call to the biocLite function:


biocLite("limma")

The output from these calls indicates the installation of the limma package. Finally, we need to load this package.


library(limma)

We can now use the commands in this package for generating Venn diagrams. The data needed for a Venn diagram consists of a set of binary variables indicating membership. We will be using the hsb2 dataset consisting of data from 200 students including scores from writing, reading, and math tests. We will create indicators for "high" values in each of these variables and generate Venn diagrams that tell us about the degree of overlap in high math, writing, and reading scores.


hsb2 <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv")
attach(hsb2)

hw <- (write >= 60)
hm <- (math >= 60)
hr <- (read >= 60)
c3 <- cbind(hw, hm, hr)

Next, we can use the vennCounts command to impose the structure needed to generate the Venn diagram.


a <- vennCounts(c3)
a


     hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22
attr(,"class")
[1] "VennCounts"

We can now generate our Venn diagram with the vennDiagram command:


vennDiagram(a)
Venn diagram showing overlapping and unique areas

While some of the options for the vennDiagram command are specific to tests run on microarray data, we can change some of the formatting. Below, we add names to the groups, we change the relative size of the labels and counts, and we opt for the counts to appear in red.


vennDiagram(a, include = "both", 
  names = c("High Writing", "High Math", "High Reading"), 
  cex = 1, counts.col = "red")
Venn diagram showing overlapping and unique areas with different labels and colours

We could opt to present just two groups in this way, but it is not possible to add a fourth. Note that the size of the areas of overlap do not coincide with the relative counts. It is also worth noting that the areas in these Venn diagrams may suggest overlap where there, in fact, is none. The example below illustrates this.


g <- cbind(
  g1 = c(rep(0, 6), rep(1, 3)), 
  g2 = c(rep(1, 6), rep(0, 3)))
d <- vennCounts(g)
vennDiagram(d)
Another venn diagram of just two groups

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.