Venn diagrams are a very commonly used graphing technique that
illustrates levels of overlap between groups in data. They can be
created in R using code written as part of the Bioconductor
Project. We are following the directions supplied here for installing a package
for linear models for microarray data (limma). To use this
code, we will use the source command in R to access the
code.
source("http://www.bioconductor.org/biocLite.R")
After running this, we can examine what has been added to our workspace and investigate the nature of these objects.
ls()[1] "biocinstall" "biocinstallPkgGroups" [3] "biocinstallRepos" "biocLite" [5] "getBioC" "sourceBiocinstallScript"class(biocLite)[1] "function"
The next step in the installation is a call to
the biocLite function:
biocLite("limma")
The output from these calls indicates the installation of
the limma package. Finally, we need to load this
package.
library(limma)
We can now use the commands in this package for generating Venn diagrams. The
data needed for a Venn diagram consists of a set of binary variables indicating
membership. We will be using the hsb2 dataset consisting of data
from 200 students including scores from writing, reading, and math tests.
We will create indicators for "high" values in each of these variables and
generate Venn diagrams that tell us about the degree of overlap in high math,
writing, and reading scores.
hsb2 <- read.csv("http://www.ats.ucla.edu/stat/data/hsb2.csv")
attach(hsb2)
hw <- (write >= 60)
hm <- (math >= 60)
hr <- (read >= 60)
c3 <- cbind(hw, hm, hr)
Next, we can use the vennCounts command to impose the
structure needed to generate the Venn diagram.
a <- vennCounts(c3)
a
hw hm hr Counts
[1,] 0 0 0 113
[2,] 0 0 1 18
[3,] 0 1 0 8
[4,] 0 1 1 8
[5,] 1 0 0 12
[6,] 1 0 1 8
[7,] 1 1 0 11
[8,] 1 1 1 22
attr(,"class")
[1] "VennCounts"
We can now generate our Venn diagram with
the vennDiagram command:
vennDiagram(a)
While some of the options for the vennDiagram command
are specific to tests run on microarray data, we can change some of
the formatting. Below, we add names to the groups, we change the
relative size of the labels and counts, and we opt for the counts to
appear in red.
vennDiagram(a, include = "both",
names = c("High Writing", "High Math", "High Reading"),
cex = 1, counts.col = "red")
We could opt to present just two groups in this way, but it is not possible to add a fourth. Note that the size of the areas of overlap do not coincide with the relative counts. It is also worth noting that the areas in these Venn diagrams may suggest overlap where there, in fact, is none. The example below illustrates this.
g <- cbind(
g1 = c(rep(0, 6), rep(1, 3)),
g2 = c(rep(1, 6), rep(0, 3)))
d <- vennCounts(g)
vennDiagram(d)
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.