UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ
How do I make a histogram with percentage on top of each bar?

Consider this simple data file with a variable called mynum. We are going to create a histogram of mynum with percentage on top of each bar. 

data test;
  input mynum @@;
  cards;
  2.3 -.6
  3.3 3.5
  2.4 5.6
  7.8 2.4
  2.8 4.5
  6.3 1.2
  0.5  .8
  .9  1.2
  1.4 1.5
  2.3 2.5
  2.7 3.5
  3.1 4.6
  5.5 5.8
  5.3 7.6
  7.3 7.8
;
run;

Let's say the midpoints that we are going to use for our histogram are -.5, 1.5, ... 7.5, since the range of the variable is between -1 to 8. The width of each bin is 1. Below in a data step, a variable called ind is created as a group variable representing each bin (bar). Then we use proc freq with ods output option to output the frequencies and percentages to a dataset called temp3

data temp2;
  set test;
  do i = -.5 to 7.5 by 1;
  if  i-.5 <= mynum < i+.5 then ind=i;
  end;
run;
proc freq data=temp2;
tables ind ;
ods output OneWayFreqs=temp3;
run;

Let's print out the ods output data set temp3 from proc freq.

proc print data=temp3;
run;
                                               
                                                                     Cum         Cum
         Obs    Table    F_ind     ind    Frequency    Percent    Frequency    Percent

          1      ind     -0.5     -0.5           1       3.33            1       3.33
          2      ind      0.5      0.5           3      10.00            4      13.33
          3      ind      1.5      1.5           4      13.33            8      26.67
          4      ind      2.5      2.5           7      23.33           15      50.00
          5      ind      3.5      3.5           4      13.33           19      63.33
          6      ind      4.5      4.5           2       6.67           21      70.00
          7      ind      5.5      5.5           4      13.33           25      83.33
          8      ind      6.5      6.5           1       3.33           26      86.67
          9      ind      7.5      7.5           4      13.33           30     100.00

We can now use this data set to create an annotate data set and use it with proc univariate to create the histogram with percentage on top of each bar. 

data anno;
   set temp3;
   length function color text $8;
    
   function = 'label';
   color    = 'blue';
   size     =  1;
   xsys     = '2';
   ysys     = '2';
   when     = 'a';
   x=ind; /*the x-coordinate for the text*/ 
   y=percent+.5; /*the y-coordinate*/
   text=left(put(percent, 4.2));
run;
proc univariate data=temp2 noprint;
histogram mynum /anno=anno cfill=pink midpoints=-.5 to 7.5 by 1;
run;

Using the same idea, we can also get a histogram with cumulative percentage on top of each bar. This time we lower the Y-axis  position so the percentage is inside of the bar.

data anno;
    set temp3;
    length function color text $8;
    
      function = 'label';
      color    = 'blue';
      size     =  1;
      xsys     = '2';
      ysys     = '2';
      when     = 'a';
      x=ind;
      y=percent-.5;
	  text=left(put(cumpercent, 4.2));
run;
proc univariate data=temp2 noprint;
histogram mynum /anno=anno cfill=pink midpoints=-.5 to 7.5 by 1;
run;

[http://www.ats.ucla.edu/stat/sas/footer.htm]