SAS Learning Module
Descriptive statistics

1. Introduction

This module illustrates how to obtain basic descriptive statistics using SAS.  We illustrate this using a data file about 26 automobiles with their make, price, mpg, repair record, and whether the car was foreign or domestic. The data file is illustrated below.

MAKE PRICE MPG REP78 FOREIGN
AMC    4099 22 3 0
AMC    4749 17 3 0
AMC    3799 22 3 0
Audi   9690 17 5 1
Audi   6295 23 3 1
BMW    9735 25 4 1
Buick  4816 20 3 0
Buick  7827 15 4 0
Buick  5788 18 3 0
Buick  4453 26 3 0
Buick  5189 20 3 0
Buick 10372 16 3 0
Buick  4082 19 3 0
Cad.  11385 14 3 0
Cad.  14500 14 2 0
Cad.  15906 21 3 0
Chev.  3299 29 3 0
Chev.  5705 16 4 0
Chev.  4504 22 3 0
Chev.  5104 22 2 0
Chev.  3667 24 2 0
Chev.  3955 19 3 0
Datsun 6229 23 4 1
Datsun 4589 35 5 1
Datsun 5079 24 4 1
Datsun 8129 21 4 1 

The program below reads the data and creates a temporary data file called auto.  The descriptive statistics shown in this module are all performed on this data file called auto.

DATA auto ;
  input MAKE $ PRICE MPG REP78 FOREIGN ;
DATALINES;
AMC    4099 22 3 0
AMC    4749 17 3 0
AMC    3799 22 3 0
Audi   9690 17 5 1
Audi   6295 23 3 1
BMW    9735 25 4 1
Buick  4816 20 3 0
Buick  7827 15 4 0
Buick  5788 18 3 0
Buick  4453 26 3 0
Buick  5189 20 3 0
Buick 10372 16 3 0
Buick  4082 19 3 0
Cad.  11385 14 3 0
Cad.  14500 14 2 0
Cad.  15906 21 3 0
Chev.  3299 29 3 0
Chev.  5705 16 4 0
Chev.  4504 22 3 0
Chev.  5104 22 2 0
Chev.  3667 24 2 0
Chev.  3955 19 3 0
Datsun 6229 23 4 1
Datsun 4589 35 5 1
Datsun 5079 24 4 1
Datsun 8129 21 4 1
;
RUN;

PROC PRINT DATA=auto(obs=10);
RUN; 

The output of the proc print is shown below.  You can compare the program above to the output below.

OBS    MAKE     PRICE    MPG    REP78    FOREIGN

  1    AMC       4099     22      3         0
  2    AMC       4749     17      3         0
  3    AMC       3799     22      3         0
  4    Audi      9690     17      5         1
  5    Audi      6295     23      3         1
  6    BMW       9735     25      4         1
  7    Buick     4816     20      3         0
  8    Buick     7827     15      4         0
  9    Buick     5788     18      3         0
 10    Buick     4453     26      3         0

2. Using proc freq for frequencies

We can use proc freq to produce frequency tables.   Below, we use it to make frequency tables for make, rep78 and foreign.

PROC FREQ DATA=auto;
  TABLES make ;
RUN;

PROC FREQ DATA=auto;
  TABLES rep78 ;
RUN;

PROC FREQ DATA=auto;
  TABLES foreign ;
RUN; 

Here is the output produced by the proc freq statements above.

                                Cumulative  Cumulative
MAKE     Frequency   Percent   Frequency    Percent
----------------------------------------------------
AMC             3      11.5           3       11.5
Audi            2       7.7           5       19.2
BMW             1       3.8           6       23.1
Buick           7      26.9          13       50.0
Cad.            3      11.5          16       61.5
Chev.           6      23.1          22       84.6
Datsun          4      15.4          26      100.0


                              Cumulative  Cumulative
REP78   Frequency   Percent   Frequency    Percent
---------------------------------------------------
    2          3      11.5           3       11.5
    3         15      57.7          18       69.2
    4          6      23.1          24       92.3
    5          2       7.7          26      100.0


                                Cumulative  Cumulative
FOREIGN   Frequency   Percent   Frequency    Percent
-----------------------------------------------------
      0         19      73.1          19       73.1
      1          7      26.9          26      100.0

Instead of having three separate proc freqs, we could have done this all in one proc freq step as illustrated below.  The output will be the same as shown above.

PROC FREQ DATA=auto;
  TABLES make rep78 foreign ;
RUN; 

Let's use proc freq to look at a cross tabulation of the repair history of the cars (rep78) for foreign and domestic cars (foreign).  The proc freq statements for this are shown below. Note the asterisk (*) between the variables rep78 and foreign on the tables statement.

PROC FREQ DATA=auto;
  TABLES rep78*foreign ;
RUN; 

This is the output produced.

TABLE OF REP78 BY FOREIGN

REP78     FOREIGN

Frequency|
Percent  |
Row Pct  |
Col Pct  |       0|       1|  Total
---------+--------+--------+
       2 |      3 |      0 |      3
         |  11.54 |   0.00 |  11.54
         | 100.00 |   0.00 |
         |  15.79 |   0.00 |
---------+--------+--------+
       3 |     14 |      1 |     15
         |  53.85 |   3.85 |  57.69
         |  93.33 |   6.67 |
         |  73.68 |  14.29 |
---------+--------+--------+
       4 |      2 |      4 |      6
         |   7.69 |  15.38 |  23.08
         |  33.33 |  66.67 |
         |  10.53 |  57.14 |
---------+--------+--------+
       5 |      0 |      2 |      2
         |   0.00 |   7.69 |   7.69
         |   0.00 | 100.00 |
         |   0.00 |  28.57 |
---------+--------+--------+
Total          19        7       26
            73.08    26.92   100.00 

We can show just the cell percentages to make the table easier to read by using the norow, nocol and nofreq options on the tables statement to suppress the printing of the row percentages, column percentages and frequencies (leaving just the cell percentages).  Note that the options come after the forward slash ( / ) on the tables statement.

PROC FREQ DATA=auto;
  TABLES rep78*foreign / NOROW NOCOL NOFREQ ;
RUN; 

The output is shown below.

TABLE OF REP78 BY FOREIGN

REP78     FOREIGN

Percent |       0|       1|  Total
--------+--------+--------+
      2 |  11.54 |   0.00 |  11.54
--------+--------+--------+
      3 |  53.85 |   3.85 |  57.69
--------+--------+--------+
      4 |   7.69 |  15.38 |  23.08
--------+--------+--------+
      5 |   0.00 |   7.69 |   7.69
--------+--------+--------+
Total         19        7       26
           73.08    26.92   100.00 

The order of the options does not matter.  We would have gotten the same output had we written the command like this.

PROC FREQ DATA=auto;
  TABLES rep78*foreign / NOFREQ NOROW NOCOL ;
RUN; 

3. Using proc means for summary statistics

 Proc means can be used to produce summary statistics.  Below, proc means is used to get descriptive statistics for the variable mpg.

 PROC MEANS DATA=auto;
  VAR mpg;
RUN;

The results of the proc means are shown below.

Analysis Variable : MPG

 N          Mean       Std Dev       Minimum       Maximum
----------------------------------------------------------
26    20.9230769     4.7575042    14.0000000    35.0000000
---------------------------------------------------------- 

Suppose we would like to get the summary statistics separately for foreign and domestic cars (indicated by the variable foreign).   We can use the class statement (shown below) to get separate results for the different values of foreign.

PROC MEANS DATA=auto;
  CLASS foreign ;
  VAR mpg;
RUN;

As you see below, the results of  proc means are presented separately for the seven foreign cars (when foreign equals 1) and the 19 domestic cars (when foreign equals 0).

Analysis Variable : MPG

FOREIGN  N Obs   N    Mean      Std Dev     Minimum   Maximum
-------------------------------------------------------------
      0     19  19    19.78     4.0356598   14.0000   29.00
      1      7   7    24.00     5.5075705   17.0000   35.00
-------------------------------------------------------------- 

4. Using proc univariate for detailed summary statistics

You can use proc univariate to get more detailed summary statistics, as shown below.

PROC UNIVARIATE DATA=auto;
  VAR mpg;
RUN; 

And here are the results of proc univariate.

Univariate Procedure

Variable=MPG

                 Moments
 N                26  Sum Wgts         26
 Mean       20.92308  Sum             544
 Std Dev    4.757504  Variance   22.63385
 Skewness   0.935473  Kurtosis     1.7927
 USS           11948  CSS        565.8462
 CV         22.73807  Std Mean   0.933023
 T:Mean=0   22.42503  Pr>|T|       0.0001
 Num ^= 0         26  Num > 0          26
 M(Sign)          13  Pr>=|M|      0.0001
 Sgn Rank      175.5  Pr>=|S|      0.0001

            Quantiles(Def=5)
 100% Max        35       99%        35
  75% Q3         23       95%        29
  50% Med        21       90%        26
  25% Q1         17       10%        15
   0% Min        14        5%        14
                           1%        14
 Range           21
 Q3-Q1            6
 Mode            22

                 Extremes
    Lowest    Obs     Highest    Obs
        14(      15)       24(      25)
        14(      14)       25(       6)
        15(       8)       26(      10)
        16(      18)       29(      17)
        16(      12)       35(      24) 

We can use the class statement to obtain separate univariate results for foreign and domestic cars.

PROC UNIVARIATE DATA=auto;
  CLASS foreign;
  VAR mpg;
RUN; 

As you see in the output below, you get a complete set of output for the case when foreign equals 0 and then another set of output when foreign equals 1. 

The UNIVARIATE Procedure
Variable:  MPG
FOREIGN = 0

                            Moments

N                          19    Sum Weights                 19
Mean               19.7894737    Sum Observations           376
Std Deviation      4.03565976    Variance            16.2865497
Skewness             0.477379    Kurtosis            0.04119835
Uncorrected SS           7734    Corrected SS        293.157895
Coeff Variation    20.3929616    Std Error Mean      0.92584385

              Basic Statistical Measures

    Location                    Variability

Mean     19.78947     Std Deviation            4.03566
Median   20.00000     Variance                16.28655
Mode     22.00000     Range                   15.00000
                      Interquartile Range      6.00000

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  21.37453    Pr > |t|    <.0001
Sign           M       9.5    Pr >= |M|   <.0001
Signed Rank    S        95    Pr >= |S|   <.0001

Quantiles (Definition 5)

Quantile      Estimate

100% Max            29
99%                 29
95%                 29
90%                 26
75% Q3              22
50% Median          20
25% Q1              16
10%                 14
5%                  14
1%                  14
0% Min              14

Variable:  MPG
FOREIGN = 0

        Extreme Observations

----Lowest----        ----Highest---

Value      Obs        Value      Obs

   14       15           22       19
   14       14           22       20
   15        8           24       21
   16       18           26       10
   16       12           29       17

Variable:  MPG
FOREIGN = 1
                            Moments

N                           7    Sum Weights                  7
Mean                       24    Sum Observations           168
Std Deviation      5.50757055    Variance            30.3333333
Skewness           1.34081176    Kurtosis            3.28605241
Uncorrected SS           4214    Corrected SS               182
Coeff Variation    22.9482106    Std Error Mean        2.081666

              Basic Statistical Measures

    Location                    Variability

Mean     24.00000     Std Deviation            5.50757
Median   23.00000     Variance                30.33333
Mode     23.00000     Range                   18.00000
                      Interquartile Range      4.00000

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  11.52923    Pr > |t|    <.0001
Sign           M       3.5    Pr >= |M|   0.0156
Signed Rank    S        14    Pr >= |S|   0.0156

Quantiles (Definition 5)

Quantile      Estimate

100% Max            35
99%                 35
95%                 35
90%                 35
75% Q3              25
50% Median          23
25% Q1              21
10%                 17
5%                  17
1%                  17
0% Min              17

Variable:  MPG
FOREIGN = 1

        Extreme Observations

----Lowest----        ----Highest---

Value      Obs        Value      Obs

   17        4           23        5
   21       26           23       23
   23       23           24       25
   23        5           25        6
   24       25           35       24

5. Problems to look out for

If you make a crosstab with proc freq and one of the variables has large number of values (say 10 or more) the crosstab table could be very hard to read.  In such cases, try using the list option on the tables statement.  

  TABLES rep78*foreign / LIST ;   

When using the by statement in proc univariate, if you choose a by variable with a large number of values (say 5, 10, or more) it will produce a very large amount of output. In such cases, you may try to use proc means with a class statement instead of proc univariate.

6. For more information

For information on Statistical Tests in SAS, see the SAS Learning Module An Overview of Statistical Tests in SAS.

7. Web Notes

You can view the SAS program associated with this module by clicking descript.sas .  While viewing the file, you can save it by choosing File then Save As from the pull-down menu of your web browser -- In the Save As dialog box, change the file name to descript.sas and then choose the directory where you want to save the file, then click Save.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.