UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Learning Module
Descriptive statistics

1. Introduction

This module illustrates how to obtain basic descriptive statistics using SAS.  We illustrate this using a data file about 26 automobiles with their make, price, mpg, repair record, and whether the car was foreign or domestic. The data file is illustrated below.

MAKE PRICE MPG REP78 FOREIGN
AMC    4099 22 3 0
AMC    4749 17 3 0
AMC    3799 22 3 0
Audi   9690 17 5 1
Audi   6295 23 3 1
BMW    9735 25 4 1
Buick  4816 20 3 0
Buick  7827 15 4 0
Buick  5788 18 3 0
Buick  4453 26 3 0
Buick  5189 20 3 0
Buick 10372 16 3 0
Buick  4082 19 3 0
Cad.  11385 14 3 0
Cad.  14500 14 2 0
Cad.  15906 21 3 0
Chev.  3299 29 3 0
Chev.  5705 16 4 0
Chev.  4504 22 3 0
Chev.  5104 22 2 0
Chev.  3667 24 2 0
Chev.  3955 19 3 0
Datsun 6229 23 4 1
Datsun 4589 35 5 1
Datsun 5079 24 4 1
Datsun 8129 21 4 1 

The program below reads the data and creates a temporary data file called auto.  The descriptive statistics shown in this module are all performed on this data file called auto.

 DATA auto ;
  input MAKE $ PRICE MPG REP78 FOREIGN ;
DATALINES;
AMC    4099 22 3 0
AMC    4749 17 3 0
AMC    3799 22 3 0
Audi   9690 17 5 1
Audi   6295 23 3 1
BMW    9735 25 4 1
Buick  4816 20 3 0
Buick  7827 15 4 0
Buick  5788 18 3 0
Buick  4453 26 3 0
Buick  5189 20 3 0
Buick 10372 16 3 0
Buick  4082 19 3 0
Cad.  11385 14 3 0
Cad.  14500 14 2 0
Cad.  15906 21 3 0
Chev.  3299 29 3 0
Chev.  5705 16 4 0
Chev.  4504 22 3 0
Chev.  5104 22 2 0
Chev.  3667 24 2 0
Chev.  3955 19 3 0
Datsun 6229 23 4 1
Datsun 4589 35 5 1
Datsun 5079 24 4 1
Datsun 8129 21 4 1
;
RUN;

PROC PRINT DATA=auto(obs=10);
RUN; 

The output of the proc print is shown below.  You can compare the program to the output below.

OBS    MAKE     PRICE    MPG    REP78    FOREIGN

  1    AMC       4099     22      3         0
  2    AMC       4749     17      3         0
  3    AMC       3799     22      3         0
  4    Audi      9690     17      5         1
  5    Audi      6295     23      3         1
  6    BMW       9735     25      4         1
  7    Buick     4816     20      3         0
  8    Buick     7827     15      4         0
  9    Buick     5788     18      3         0
 10    Buick     4453     26      3         0

2. Using proc freq for frequencies

We can use proc freq to produce frequency tables.   Below, we use it to make frequency tables for make, rep78 and foreign.

 PROC FREQ DATA=auto;
  TABLES make ;
RUN;

PROC FREQ DATA=auto;
  TABLES rep78 ;
RUN;

PROC FREQ DATA=auto;
  TABLES foreign ;
RUN; 

Here is the output produced by the proc freq statements above.

                               Cumulative  Cumulative
MAKE     Frequency   Percent   Frequency    Percent
----------------------------------------------------
AMC             3      11.5           3       11.5
Audi            2       7.7           5       19.2
BMW             1       3.8           6       23.1
Buick           7      26.9          13       50.0
Cad.            3      11.5          16       61.5
Chev.           6      23.1          22       84.6
Datsun          4      15.4          26      100.0


                             Cumulative  Cumulative
REP78   Frequency   Percent   Frequency    Percent
---------------------------------------------------
    2          3      11.5           3       11.5
    3         15      57.7          18       69.2
    4          6      23.1          24       92.3
    5          2       7.7          26      100.0


                             Cumulative  Cumulative
FOREIGN   Frequency   Percent   Frequency    Percent
-----------------------------------------------------
      0         19      73.1          19       73.1
      1          7      26.9          26      100.0

Instead of having three separate proc freqs, we could have done this all in one proc freq step as illustrated below.

 PROC FREQ DATA=auto;
  TABLES make price mpg rep78 foreign ;
RUN; 

Let's use proc freq to look at a cross tabulation of the repair history of the cars (rep78) for foreign and domestic cars (foreign).  The proc freq statements for this are shown below.

 PROC FREQ DATA=auto;
  TABLES rep78*foreign ;
RUN; 

This is the output produced.

TABLE OF REP78 BY FOREIGN

REP78     FOREIGN

Frequency|
Percent  |
Row Pct  |
Col Pct  |       0|       1|  Total
---------+--------+--------+
       2 |      3 |      0 |      3
         |  11.54 |   0.00 |  11.54
         | 100.00 |   0.00 |
         |  15.79 |   0.00 |
---------+--------+--------+
       3 |     14 |      1 |     15
         |  53.85 |   3.85 |  57.69
         |  93.33 |   6.67 |
         |  73.68 |  14.29 |
---------+--------+--------+
       4 |      2 |      4 |      6
         |   7.69 |  15.38 |  23.08
         |  33.33 |  66.67 |
         |  10.53 |  57.14 |
---------+--------+--------+
       5 |      0 |      2 |      2
         |   0.00 |   7.69 |   7.69
         |   0.00 | 100.00 |
         |   0.00 |  28.57 |
---------+--------+--------+
Total          19        7       26
            73.08    26.92   100.00 

We can show just the cell percentages to make the table easier to read by using the norow, nocol and nofreq options on the tables statement to suppress the printing of the row percentages, column percentages and frequencies (leaving just the cell percentages).  Note that the options come after the / on the tables statement.

PROC FREQ DATA=auto;
  TABLES rep78*foreign / NOROW NOCOL NOFREQ ;
RUN; 

The output is shown below.

TABLE OF REP78 BY FOREIGN

REP78     FOREIGN

Percent |       0|       1|  Total
--------+--------+--------+
      2 |  11.54 |   0.00 |  11.54
--------+--------+--------+
      3 |  53.85 |   3.85 |  57.69
--------+--------+--------+
      4 |   7.69 |  15.38 |  23.08
--------+--------+--------+
      5 |   0.00 |   7.69 |   7.69
--------+--------+--------+
Total         19        7       26
           73.08    26.92   100.00 

The order of the options does not matter.  We would have gotten the same output had we written the command like this.

 PROC FREQ DATA=auto;
  TABLES rep78*foreign / NOFREQ NOROW NOCOL ;
RUN; 

3. Using proc means for summary statistics

To produce summary statistics, proc means can be used.   Below, proc means is used to get descriptive statistics for the variable mpg.

 PROC MEANS DATA=auto;
  VAR mpg;
RUN;

The results of the proc means are shown below.

Analysis Variable : MPG

 N          Mean       Std Dev       Minimum       Maximum
----------------------------------------------------------
26    20.9230769     4.7575042    14.0000000    35.0000000
---------------------------------------------------------- 

Suppose we would like to get the summary statistics separately for foreign and domestic cars (indicated by the variable foreign).   We can use the class statement as shown below to get separate results for the different values of foreign.

PROC MEANS DATA=auto;
  CLASS foreign ;
  VAR mpg;
RUN;

As you see below, the results are presented separately for the seven foreign cars (foreign equals 1) and the 19 domestic cars (when foreign is 0).

Analysis Variable : MPG

FOREIGN  N Obs   N    Mean      Std Dev     Minimum   Maximum
-------------------------------------------------------------
      0     19  19    19.78     4.0356598   14.0000   29.00
      1      7   7    24.00     5.5075705   17.0000   35.00
-------------------------------------------------------------- 

4. Using proc univariate for detailed summary statistics

You can use proc univariate to get more detailed summary statistics, as shown below.

 PROC UNIVARIATE DATA=auto;
  VAR mpg;
RUN; 

And here are the results of the proc univariate.

Univariate Procedure

Variable=MPG

                 Moments
 N                26  Sum Wgts         26
 Mean       20.92308  Sum             544
 Std Dev    4.757504  Variance   22.63385
 Skewness   0.935473  Kurtosis     1.7927
 USS           11948  CSS        565.8462
 CV         22.73807  Std Mean   0.933023
 T:Mean=0   22.42503  Pr>|T|       0.0001
 Num ^= 0         26  Num > 0          26
 M(Sign)          13  Pr>=|M|      0.0001
 Sgn Rank      175.5  Pr>=|S|      0.0001

            Quantiles(Def=5)
 100% Max        35       99%        35
  75% Q3         23       95%        29
  50% Med        21       90%        26
  25% Q1         17       10%        15
   0% Min        14        5%        14
                           1%        14
 Range           21
 Q3-Q1            6
 Mode            22

                 Extremes
    Lowest    Obs     Highest    Obs
        14(      15)       24(      25)
        14(      14)       25(       6)
        15(       8)       26(      10)
        16(      18)       29(      17)
        16(      12)       35(      24) 

To obtain separate univariate results for foreign and domestic cars, you would naturally think about the class statement that we used with proc means.  While many SAS PROCs permit the use of the class statement, proc univariate does not permit the class statement.  Instead, we can use proc sort to sort the data by foreign and then with the proc univariate use the by statement as illustrated below.

PROC SORT DATA=auto;
  BY foreign;
RUN;

PROC UNIVARIATE DATA=auto;
  BY foreign;
  VAR mpg;
RUN; 

As you see in the output below, you get a complete set of output for the case where foreign is 0 and then another set of output when foreign is 1. 

FOREIGN=0

Univariate Procedure

Variable=MPG

                 Moments
 N                19  Sum Wgts         19
 Mean       19.78947  Sum             376
 Std Dev     4.03566  Variance   16.28655
 Skewness   0.477379  Kurtosis   0.041198
 USS            7734  CSS        293.1579
 CV         20.39296  Std Mean   0.925844
 T:Mean=0   21.37453  Pr>|T|       0.0001
 Num ^= 0         19  Num > 0          19
 M(Sign)         9.5  Pr>=|M|      0.0001
 Sgn Rank         95  Pr>=|S|      0.0001

            Quantiles(Def=5)
 100% Max        29       99%        29
  75% Q3         22       95%        29
  50% Med        20       90%        26
  25% Q1         16       10%        14
   0% Min        14        5%        14
                           1%        14
 Range           15
 Q3-Q1            6
 Mode            22

                 Extremes
    Lowest    Obs     Highest    Obs
        14(      12)       22(      16)
        14(      11)       22(      17)
        15(       5)       24(      18)
        16(      15)       26(       7)
        16(       9)       29(      14) 

FOREIGN=1

Univariate Procedure

Variable=MPG
                 Moments
 N                 7  Sum Wgts          7
 Mean             24  Sum             168
 Std Dev    5.507571  Variance   30.33333
 Skewness   1.340812  Kurtosis   3.286052
 USS            4214  CSS             182
 CV         22.94821  Std Mean   2.081666
 T:Mean=0   11.52923  Pr>|T|       0.0001
 Num ^= 0          7  Num > 0           7
 M(Sign)         3.5  Pr>=|M|      0.0156
 Sgn Rank         14  Pr>=|S|      0.0156

            Quantiles(Def=5)
 100% Max        35       99%        35
  75% Q3         25       95%        35
  50% Med        23       90%        35
  25% Q1         21       10%        17
   0% Min        17        5%        17
                           1%        17
 Range           18
 Q3-Q1            4
 Mode            23

                 Extremes
    Lowest    Obs     Highest    Obs
        17(       1)       23(       2)
        21(       7)       23(       4)
        23(       4)       24(       6)
        23(       2)       25(       3)
        24(       6)       35(       5) 

5. Problems to look out for

If you make a crosstab with proc freq and one of the variables has large number of values (say 10 or more) the crosstab table could be very hard to read.  In such cases, try using the list option on the tables statement, e.g.,
 
  TABLES rep78*foreign / LIST ; 

When using the by statement in proc univariate, if you choose a by variable with a large number of values (say 5, 10, or more) it will produce a very large amount of output. In such cases, you may try to use proc means with a class statement instead of proc univariate.

6. For more information

For information on Statistical Tests in SAS, see the SAS Learning Module An Overview of Statistical Tests in SAS.

7. Web Notes

You can view the SAS program associated with this module by clicking descript.sas .  While viewing the file, you can save it by choosing File then Save As from the pull-down menu of your web browser -- In the Save As dialog box, change the file name to descript.sas and then choose the directory where you want to save the file, then click Save.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California