|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
Proc tabulate is predominately used to make nice looking tables of summary statistics. This procedure is often used to create tables to be used in publications because it allows for a great deal of manipulation and control over almost every aspect of the table. It is a very versatile procedure that allows you to make simple two by two summary tables or customized multipage reports and everything in between. A few features include customizing the number, type and position of summary statistics. You can also customize the appearance of tables by specifying text position, size, font and color.
This page will show you how to use proc tabulate to make a wide variety of tables. We will use the hsb2 dataset which contains demographic and academic data about 200 high school students.
Lets say that we need to present a table of mean writing scores by socioeconomic status and gender. Below we use proc means to create such a table.
proc means data=hsb2 mean; class female ses; var write; run;N female ses Obs Mean --------------------------------------------------- 0 1 15 46.6000000 2 47 49.5531915 3 29 52.8620690 1 1 32 52.5000000 2 48 54.2500000 3 29 58.9655172 ---------------------------------------------------
The table above displays the information that we need and we could even make it easier to read by using value labels. However, the only control over the layout of the table is the ordering of the classification variables. For example we could change the position of female and ses in the output by switching their order in the class statement. On the other had we could use proc tabulate which will give us the same information with the added benefit of being able to arrange the table in the most appropriate way. Now lets try using proc tabulate to create a table with the same information as the one above. The following table is a basic example that specifies one row variable (female), one column variable (ses), one analysis variable (write) and one summary statistic (mean). Note that if we don't specify the mean as the summary statistic our table will contain the sum because it is the default summary statistic.
proc tabulate data =hsb2; class ses female; var write; table female*mean, write*ses; run;------------------------------------------------------------------------ | | write | | |--------------------------------------| | | ses | | |--------------------------------------| | | 1 | 2 | 3 | |-------------------------------+------------+------------+------------| |female | | | | | |---------------+---------------| | | | |0 |Mean | 46.60| 49.55| 52.86| |---------------+---------------+------------+------------+------------| |1 |Mean | 52.50| 54.25| 58.97| ------------------------------------------------------------------------
The table above presents the information in a layout that is easier to read than the proc means output. Now we have a table with a convenient, easy to read layout that displays the mean. Lets say that we would prefer to move gender and the mean to the columns and move only socioeconomic status to the rows. As you can see this is a much different layout than the table above with all the same information.
proc tabulate data =hsb2; class ses female; var write; table ses*mean, write*female; run;----------------------------------------------------------- | | Mean | | |-------------------------| | | write | | |-------------------------| | | female | | |-------------------------| | | 0 | 1 | |-------------------------------+------------+------------| |ses | | | |-------------------------------| | | |1 | 46.60| 52.50| |-------------------------------+------------+------------| |2 | 49.55| 54.25| |-------------------------------+------------+------------| |3 | 52.86| 58.97| -----------------------------------------------------------
From the two examples above, we see how the table statement works. It has four basic parts:
Very important programming note: The ordering of the variables and statistics in the table statement can drastically change the layout and appearance of your table. In the table statement the comma separates the rows from the columns, anything before the comma will be part of the rows and anything after the comma will be part of the columns.
Now that we have seen the basics lets use a few options to refine the presentation of the table. We will start by making the headings more compact with easier to read titles. The example below shows how to change heading labels to make the table more reader friendly. Instead of letting SAS choose the variable name as the default label we can add more informative headings by setting each variable equal to a new title. We have changed ses to 'Socioeconomic Status' and write to 'Mean writing score'. Also notice that we can delete the heading 'Mean' by using a blank label.
proc tabulate data = hsb2; class ses female; var write; table ses='Socioeconomic Status'*mean, write='Mean writing score'*female; run;----------------------------------------------------------- | | Mean writing score | | |-------------------------| | | female | | |-------------------------| | | 0 | 1 | |-------------------------------+------------+------------| |Socioeconomic Status | | | |-------------------------------| | | |1 | 46.60| 52.50| |-------------------------------+------------+------------| |2 | 49.55| 54.25| |-------------------------------+------------+------------| |3 | 52.86| 58.97| -----------------------------------------------------------
The table above looks good but could still use some improvement. It is difficult to understand what we are looking at because we don't know what ses = 1 means. It would be much more convenient to use formats to change the numeric values to value labels.
proc format; value fm 1='Female' 0='Male' ; value ses 1='Low' 2='Middle' 3='High'; run; proc tabulate data=hsb2; class ses female; var write; table ses='Socioeconomic Status', mean=' '*write='Mean writing score'*female=''; format female fm. ses ses.; run;----------------------------------------------------------- | | Mean writing score | | |-------------------------| | | Gender | | |-------------------------| | | Male | Female | |-------------------------------+------------+------------| |Socioeconomic Status | | | |-------------------------------| | | |Low | 46.60| 52.50| |-------------------------------+------------+------------| |Middle | 49.55| 54.25| |-------------------------------+------------+------------| |High | 52.86| 58.97| -----------------------------------------------------------
The table above looks great and is very easy to read. We can easily see that the average reading score for males from a low socioeconomic status is 46.60. Additionally, we can make the same table in a more compact fashion by removing all of the headings and adding one table heading in the upper left hand corner. Notice that this example also uses the rtspace= option. This option allows you to specify the number of spaces for row titles to ensure that the whole title is printed in the table.
proc tabulate data = hsb2; class ses female; var write; table ses=''*mean, write=''*female='' / box=[label="Mean of writing Score by ses and female" style=[font_style=italic]] rtspace=42; format female fm. ses ses.; run;-------------------------------------------------------------------- |Mean of writing Score by ses and female | Male | Female | |----------------------------------------+------------+------------| |Low | 46.60| 52.50| |----------------------------------------+------------+------------| |Middle | 49.55| 54.25| |----------------------------------------+------------+------------| |High | 52.86| 58.97| --------------------------------------------------------------------
Now that we have seen the basics of formatting lets try making tables that display more information. Lets say that we want to present a table with both the mean math and the mean writing scores. The following is one of many ways to present this information. We decided on this particular layout because it is most important to compare males and females for each socioeconomic groups by subject.
proc tabulate data = hsb2; class ses female; var write math; table ses*mean*(write math), female; format female fm. ses ses.; run;----------------------------------------------------------- | | female | | |-------------------------| | | Male | Female | |-------------------------------+------------+------------| |ses | | | | | |---------+---------+-----------| | | |Low |Mean |write | 46.60| 52.50| | | |-----------+------------+------------| | | |math | 47.60| 49.91| |---------+---------+-----------+------------+------------| |Middle |Mean |write | 49.55| 54.25| | | |-----------+------------+------------| | | |math | 53.47| 50.98| |---------+---------+-----------+------------+------------| |High |Mean |write | 52.86| 58.97| | | |-----------+------------+------------| | | |math | 54.86| 57.48| -----------------------------------------------------------
Lets say that we would like to present both the mean and standard deviation of write in a single table.
proc tabulate data = hsb2; class ses female; var write math; table ses*(mean std), write*female / box=[label="mean and standard deviation by ses and female"]; format female fm. ses ses.; run;----------------------------------------------------------- |mean and standard deviation by | write | |ses and female |-------------------------| | | female | | |-------------------------| | | Male | Female | |-------------------------------+------------+------------| |ses | | | | |---------------+---------------| | | |Low |Mean | 46.60| 52.50| | |---------------+------------+------------| | |Std | 9.03| 9.24| |---------------+---------------+------------+------------| |Middle |Mean | 49.55| 54.25| | |---------------+------------+------------| | |Std | 10.16| 7.33| |---------------+---------------+------------+------------| |High |Mean | 52.86| 58.97| | |---------------+------------+------------| | |Std | 10.78| 6.79| -----------------------------------------------------------
Now lets combine the two tables above and present the mean and standard deviation of both math and write.
proc tabulate data = hsb2; class ses female; var write math; table ses*(mean std), (write math)*female / box=[label="mean and standard deviation by ses and female"]; format female fm. ses ses.; run;------------------------------------------------------------------------------------- |mean and standard deviation by | write | math | |ses and female |-------------------------+-------------------------| | | female | female | | |-------------------------+-------------------------| | | Male | Female | Male | Female | |-------------------------------+------------+------------+------------+------------| |ses | | | | | | |---------------+---------------| | | | | |Low |Mean | 46.60| 52.50| 47.60| 49.91| | |---------------+------------+------------+------------+------------| | |Std | 9.03| 9.24| 6.78| 9.72| |---------------+---------------+------------+------------+------------+------------| |Middle |Mean | 49.55| 54.25| 53.47| 50.98| | |---------------+------------+------------+------------+------------| | |Std | 10.16| 7.33| 10.57| 7.92| |---------------+---------------+------------+------------+------------+------------| |High |Mean | 52.86| 58.97| 54.86| 57.48| | |---------------+------------+------------+------------+------------| | |Std | 10.78| 6.79| 8.62| 8.72| -------------------------------------------------------------------------------------
Proc tabulate is capable of producing sophisticated multi-level tables. Lets say that we would like to present the mean writing and math scores by socioeconomic status and also by program type. The following table shows how to stack variables ses and prog. Notice in the following program that ses and prog are connected individually to write and math. This ensures that the variables will appear consecutively or stacked in the table.
proc format; value fm 1='Female' 0='Male' ; value ses 1='Low' 2='Middle' 3='High'; value prog 1='General' 2='Academic' 3='Vocational'; run; proc tabulate data = tab.hsb2; class ses female prog; var write math; table ses='Socioeconomic Status'*(write='Write' math='Math') prog='Program Type'*(write='Write' math='Math'), mean*female=''; format female fm. ses ses. prog prog.; run;----------------------------------------------------------- | | Mean | | |-------------------------| | | Male | Female | |-------------------------------+------------+------------| |Socioeconomic | | | | |Status | | | | |---------------+---------------| | | |Low |Write | 46.60| 52.50| | |---------------+------------+------------| | |Math | 47.60| 49.91| |---------------+---------------+------------+------------| |Middle |Write | 49.55| 54.25| | |---------------+------------+------------| | |Math | 53.47| 50.98| |---------------+---------------+------------+------------| |High |Write | 52.86| 58.97| | |---------------+------------+------------| | |Math | 54.86| 57.48| |---------------+---------------+------------+------------| |Program Type | | | | |---------------+---------------| | | |General |Write | 49.14| 53.25| | |---------------+------------+------------| | |Math | 50.19| 49.88| |---------------+---------------+------------+------------| |Academic |Write | 54.62| 57.59| | |---------------+------------+------------| | |Math | 57.13| 56.41| |---------------+---------------+------------+------------| |Vocational |Write | 41.83| 50.96| | |---------------+------------+------------| | |Math | 46.91| 46.00| -----------------------------------------------------------
We still want to present the mean writing and math scores by socioeconomic status and program type. The following table shows program type nested in socioeconomic status. Notice that in the program below ses and prog are connected by an *. The connection ensures that these two variables will appear nested in the table and that the cells of the table will represent more specific groups. The flexibility of proc tabulate allows complex nesting structures and wide variety of layouts.
proc tabulate data = tab.hsb2; class ses female prog; var write math; table ses='Socioeconomic Status'*prog='Program Type'*(mean std), (write='Write' math='Math')*female=''/ rts=43 ; format female fm. ses ses. prog prog.; run;----------------------------------------------------------------------------------------------- | | Write | Math | | |-------------------------+-------------------------| | | Male | Female | Male | Female | |-----------------------------------------+------------+------------+------------+------------| |Socioeconomic|Program Type | | | | | | |Status | | | | | | | |-------------+-------------+-------------| | | | | |Low |General |Mean | 48.29| 53.89| 46.71| 48.22| | | |-------------+------------+------------+------------+------------| | | |Std | 6.05| 6.92| 8.12| 7.55| | |-------------+-------------+------------+------------+------------+------------| | |Academic |Mean | 51.25| 54.27| 50.00| 55.00| | | |-------------+------------+------------+------------+------------| | | |Std | 11.79| 9.50| 6.68| 10.28| | |-------------+-------------+------------+------------+------------+------------| | |Vocational |Mean | 39.00| 47.63| 46.75| 42.25| | | |-------------+------------+------------+------------+------------| | | |Std | 7.48| 10.32| 5.25| 3.92| |-------------+-------------+-------------+------------+------------+------------+------------| |Middle |General |Mean | 47.20| 52.00| 51.10| 51.10| | | |-------------+------------+------------+------------+------------| | | |Std | 10.67| 8.25| 9.27| 6.03| | |-------------+-------------+------------+------------+------------+------------| | |Academic |Mean | 55.59| 57.14| 58.14| 54.50| | | |-------------+------------+------------+------------+------------| | | |Std | 6.65| 6.44| 10.09| 7.26| | |-------------+-------------+------------+------------+------------+------------| | |Vocational |Mean | 42.27| 51.69| 48.20| 46.06| | | |-------------+------------+------------+------------+------------| | | |Std | 9.02| 6.85| 9.54| 7.54| |-------------+-------------+-------------+------------+------------+------------+------------| |High |General |Mean | 55.50| 54.60| 54.00| 50.40| | | |-------------+------------+------------+------------+------------| | | |Std | 15.26| 11.46| 4.32| 7.70| | |-------------+-------------+------------+------------+------------+------------| | |Academic |Mean | 54.24| 60.43| 57.43| 59.43| | | |-------------+------------+------------+------------+------------| | | |Std | 10.08| 4.55| 7.74| 8.15| | |-------------+-------------+------------+------------+------------+------------| | |Vocational |Mean | 43.00| 56.00| 42.25| 55.67| | | |-------------+------------+------------+------------+------------| | | |Std | 4.55| 9.64| 3.95| 10.50| -----------------------------------------------------------------------------------------------
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services