UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

Stata Code Fragment: 
Creating and extending boxplots using twoway graphs

Standard boxplots, as well as a variety of "boxplot like" graphs can be created using combinations of Stata's twoway graph commands. If you are trying to create a relatively standard boxplot, you probably want to use Stata's graph box command, however, if you wish to create a boxplot with a non-standard attribute (e.g. a boxplot that includes a marker at the mean), you can do this using Stata's graph twoway commands. The examples below are based on those shown in the Stata Journal article:

Cox, N.J. (2009). Speaking Stata: Creating and varying box plots. The Stata Journal, 9(3), 478-496.

The first example shows how to re-create a boxplot using a twoway graph, as well as how to add a marker at the mean of the distribution. The second example shows how to create a boxplot that displays the individual data points down the center of the box (instead of whiskers). The examples show the steps of building up the graph to make it easier to understand what is going on.

Example 1: Boxplot by group with a marker for the mean.

Step 1: Data management. Compute all the relevant values (medians, quartiles, etc.) of read by prog.

use http://www.ats.ucla.edu/stat/data/hsb2, clear
sort prog

* Use egen to generate the median, quartiles, interquartile range (IQR), and mean. 
by prog: egen med = median(read)
by prog: egen lqt = pctile(read), p(25)
by prog: egen uqt = pctile(read), p(75)
by prog: egen iqr = iqr(read)
by prog: egen mean = mean(read)

* Find the lowest value (egen min) that is more than lqt - 1.5 iqr (the max() function)
* this is used to form the lower "whisker" of the boxplot.
by prog: egen ls = min(max(read, lqt-1.5*iqr))

* Find the highest value (egen max) that is less than lqt + 1.5 iqr (the min() function)
* this is used to form the upper "whisker" of the boxplot.
by prog: egen us = max(min(read, uqt+1.5*iqr)) 

* Find any outliers (i.e. values more than 1.5 IQRs from the upper and lower quartiles). 
gen outliers = read if(read<=lqt-1.5*iqr | read>=uqt+1.5*iqr)

Step 2: Use rbar to graph the portion of the box below the median.

twoway rbar lqt med prog

Step3: Add a second rbar command for the portion of the box above the median.

twoway rbar lqt med prog, || ///
       rbar med uqt prog

Step 4: Use rspike to add the upper and lower "whiskers" and suppress the display of the legend.

twoway rbar lqt med prog || ///
       rbar med uqt prog || ///
       rspike lqt ls prog || ///
       rspike uqt us prog, legend(off)

Step 5: Use rcap to add the upper and lower ends to the whiskers.

twoway rbar lqt med prog || ///
       rbar med uqt prog || ///
       rspike lqt ls prog || ///
       rspike uqt us prog || ///
       rcap ls ls prog, msize(*6)|| ///
       rcap us us prog, msize(*6) legend(off)

Step 6: Use scatter to show any outliers.

twoway rbar lqt med prog || ///
       rbar med uqt prog || ///
       rspike lqt ls prog || ///
       rspike uqt us prog || ///
       rcap ls ls prog, msize(*6)|| ///
       rcap us us prog, msize(*6) || ///
       scatter outliers prog, legend(off)

Step 7: Use scatter to place the marker for the mean.

twoway rbar lqt med prog || ///
       rbar med uqt prog || ///
       rspike lqt ls prog || ///
       rspike uqt us prog || ///
       rcap ls ls prog, msize(*6)|| ///
       rcap us us prog, msize(*6) || ///
       scatter outliers prog || ///
       scatter mean prog, msymbol(Oh) msize(*2) legend(off)

Step 8: Force all of the boxes, lines, etc. to be the same color (pstyle(...) option), make boxes narrower (barw(#)), and add axis labels.

twoway rbar lqt med prog, pstyle(p1) barw(.5) || ///
       rbar med uqt prog, pstyle(p1) barw(.5) || ///
       rspike lqt ls prog, pstyle(p1) || ///
       rspike uqt us prog, pstyle(p1) || ///
       rcap ls ls prog, msize(*6) pstyle(p1) || ///
       rcap us us prog, msize(*6) pstyle(p1) || ///
       scatter outliers prog, pstyle(p1) || ///
       scatter mean prog, msymbol(Oh) msize(*2) legend(off) ///
       xlabel( 1 "general" 2 "academic" 3 "vocational") ytitle(reading score)

Step 9: Convert to grayscale using the fcolor(...), mcolor(...), and, lcolor(...) options.

twoway rbar lqt med prog, fcolor(gs12) lcolor(black) barw(.5) || ///
       rbar med uqt prog, fcolor(gs12) lcolor(black) barw(.5) || ///
       rspike lqt ls prog, lcolor(black) || ///
       rspike uqt us prog, lcolor(black) || ///
       rcap ls ls prog, msize(*6) lcolor(black) || ///
       rcap us us prog, msize(*6) pstyle(p1) || ///
       scatter outliers prog, mcolor(black) || ///
       scatter mean prog, msymbol(Oh) msize(*2) fcolor(gs12) mcolor(black) ///
       legend(off)  xlabel( 1 "general" 2 "academic" 3 "vocational") ///
       ytitle(reading score) graphregion(fcolor(gs15))


Example 2: Boxplot by group with datapoints.

Step 1: Use egen to create variables containing the median, upper, and lower quartiles of read, by prog.

use http://www.ats.ucla.edu/stat/data/hsb2, clear

sort prog
by prog: egen med = median(read)
by prog: egen lqt = pctile(read), p(25)
by prog: egen uqt = pctile(read), p(75)

Step 2: Use rbar to graph the portion of the box below the median.

twoway rbar lqt med prog

Step 3: Add a second rbar command for the portion of the box above the median.

twoway rbar lqt med prog, || ///
       rbar med uqt prog

Step 4: Add a scatterplot of read with prog.

twoway rbar lqt med prog, || ///
	rbar med uqt prog || ///
	scatter read prog

Step 5: Force all portions of the graph to be the same color using pstyle(...).

twoway rbar lqt med prog, pstyle(p1) || ///
	rbar med uqt prog, pstyle(p1) || ///
	scatter read prog, pstyle(p1)

Step 6: Make the boxes narrower using the barw(#) option, suppress the legend, and add axis labels.

twoway rbar lqt med prog, pstyle(p1) barw(.5) || ///
	rbar med uqt prog, pstyle(p1) barw(.5) || ///
	scatter read prog, pstyle(p1) ///
	legend(off) xlabel( 1 "general" 2 "academic" 3 "vocational") ///
	ytitle(reading score)

Step 7: Convert to gray scale using fcolor(...), lcolor(...), and mcolor(...) if desired.

twoway rbar lqt med prog, barw(.5) fcolor(gs12) lcolor(black) || ///
	rbar med uqt prog, barw(.5) fcolor(gs12) lcolor(black) || ///
	scatter read prog, graphregion(fcolor(gs15)) mcolor(black) msymbol(Oh) ///
	legend(off) xlabel( 1 "general" 2 "academic" 3 "vocational") ///
	ytitle(reading score) 
 

How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.