SAS Learning Module
Using proc sort and by statements

1. Introduction

This module will examine the use of proc sort and use of the by statement with SAS procedures.  The program below creates a data file called auto that we will use in our examples. Note that this file has a duplicate record for the BMW.

DATA auto ;
  INPUT make $  mpg rep78 weight foreign ;
CARDS ;
AMC     22 3 2930 0
AMC     17 3 3350 0
AMC     22 . 2640 0
Audi    17 5 2830 1
Audi    23 3 2070 1
BMW     25 4 2650 1
BMW     25 4 2650 1
Buick   20 3 3250 0
Buick   15 4 4080 0
Buick   18 3 3670 0
Buick   26 . 2230 0
Buick   20 3 3280 0
Buick   16 3 3880 0
Buick   19 3 3400 0
Cad.    14 3 4330 0
Cad.    14 2 3900 0
Cad.    21 3 4290 0
Chev.   29 3 2110 0
Chev.   16 4 3690 0
Chev.   22 3 3180 0
Chev.   22 2 3220 0
Chev.   24 2 2750 0
Chev.   19 3 3430 0
Datsun  23 4 2370 1
Datsun  35 5 2020 1
Datsun  24 4 2280 1
Datsun  21 4 2750 1
;
RUN ;
 
PROC PRINT DATA=auto ;
RUN ;  

The output from the program is shown below. The proc print shows that the data file has been successfully created.

OBS    MAKE      MPG    REP78    WEIGHT    FOREIGN

  1    AMC        22      3       2930        0
  2    AMC        17      3       3350        0
  3    AMC        22      .       2640        0
  4    Audi       17      5       2830        1
  5    Audi       23      3       2070        1
  6    BMW        25      4       2650        1
  7    BMW        25      4       2650        1
  8    Buick      20      3       3250        0
  9    Buick      15      4       4080        0
 10    Buick      18      3       3670        0
 11    Buick      26      .       2230        0
 12    Buick      20      3       3280        0
 13    Buick      16      3       3880        0
 14    Buick      19      3       3400        0
 15    Cad.       14      3       4330        0
 16    Cad.       14      2       3900        0
 17    Cad.       21      3       4290        0
 18    Chev.      29      3       2110        0
 19    Chev.      16      4       3690        0
 20    Chev.      22      3       3180        0
 21    Chev.      22      2       3220        0
 22    Chev.      24      2       2750        0
 23    Chev.      19      3       3430        0
 24    Datsun     23      4       2370        1
 25    Datsun     35      5       2020        1
 26    Datsun     24      4       2280        1
 27    Datsun     21      4       2750        1

2. Sorting data with proc sort

We can use proc sort to sort this data file. The program below sorts the auto data file on the variable foreign (1=foreign car, 0=domestic car) and saves the sorted file as auto2. The original file remains unchanged since we used out=auto2 to specify that the sorted data should be placed in  auto2.

PROC SORT DATA=auto OUT=auto2 ;
  BY foreign ;
RUN ;
 
PROC PRINT DATA=auto2 ;
RUN ;  

From the proc print below, you can see that auto2 is indeed sorted on foreign. The observations where foreign is 0 precede all of the observations where foreign is 1.  Note that the order of the observations within each group remain unchanged, (i.e., the observations where foreign is 0 remain in the same order).

OBS    MAKE      MPG    REP78    WEIGHT    FOREIGN

  1    AMC        22      3       2930        0
  2    AMC        17      3       3350        0
  3    AMC        22      .       2640        0
  4    Buick      20      3       3250        0
  5    Buick      15      4       4080        0
  6    Buick      18      3       3670        0
  7    Buick      26      .       2230        0
  8    Buick      20      3       3280        0
  9    Buick      16      3       3880        0
 10    Buick      19      3       3400        0
 11    Cad.       14      3       4330        0
 12    Cad.       14      2       3900        0
 13    Cad.       21      3       4290        0
 14    Chev.      29      3       2110        0
 15    Chev.      16      4       3690        0
 16    Chev.      22      3       3180        0
 17    Chev.      22      2       3220        0
 18    Chev.      24      2       2750        0
 19    Chev.      19      3       3430        0
 20    Audi       17      5       2830        1
 21    Audi       23      3       2070        1
 22    BMW        25      4       2650        1
 23    BMW        25      4       2650        1
 24    Datsun     23      4       2370        1
 25    Datsun     35      5       2020        1
 26    Datsun     24      4       2280        1
 27    Datsun     21      4       2750        1

Suppose you wanted the data sorted, but with the foreign cars (foreign=1) first and the domestic cars (foreign=0) second. The example below shows the use of the descending keyword to tell SAS that you want to sort by foreign, but you want the sort order reversed (i.e., largest to smallest).

PROC SORT DATA=auto OUT=auto3 ;
  BY DESCENDING foreign ;
RUN ;
 
PROC PRINT DATA=auto3 ;
RUN ;  

You can see in the proc print below that the data are now ordered by foreign, but highest to lowest.

OBS    MAKE      MPG    REP78    WEIGHT    FOREIGN

  1    Audi       17      5       2830        1
  2    Audi       23      3       2070        1
  3    BMW        25      4       2650        1
  4    BMW        25      4       2650        1
  5    Datsun     23      4       2370        1
  6    Datsun     35      5       2020        1
  7    Datsun     24      4       2280        1
  8    Datsun     21      4       2750        1
  9    AMC        22      3       2930        0
 10    AMC        17      3       3350        0
 11    AMC        22      .       2640        0
 12    Buick      20      3       3250        0
 13    Buick      15      4       4080        0
 14    Buick      18      3       3670        0
 15    Buick      26      .       2230        0
 16    Buick      20      3       3280        0
 17    Buick      16      3       3880        0
 18    Buick      19      3       3400        0
 19    Cad.       14      3       4330        0
 20    Cad.       14      2       3900        0
 21    Cad.       21      3       4290        0
 22    Chev.      29      3       2110        0
 23    Chev.      16      4       3690        0
 24    Chev.      22      3       3180        0
 25    Chev.      22      2       3220        0
 26    Chev.      24      2       2750        0
 27    Chev.      19      3       3430        0

It is also possible to sort on more than one variable at a time.  Perhaps you would like the data sorted on foreign (this time we will go back to the normal sort order for foreign) and then sorted by rep78 within each level of foreign.  The example below shows how this can be done.

PROC SORT DATA=auto OUT=auto4 ;
  BY foreign rep78 ;
RUN ;
 
PROC PRINT DATA=auto4 ;
RUN ;  

You can see in the proc print below that the data are now ordered by foreign, domestic cars (foreign=0) followed by foreign (foreign=1) cars. Within the domestic cars, the data are sorted by rep78 and within foreign cars the data are also sorted by rep78.

OBS    MAKE      MPG    REP78    WEIGHT    FOREIGN

  1    AMC        22      .       2640        0
  2    Buick      26      .       2230        0
  3    Cad.       14      2       3900        0
  4    Chev.      22      2       3220        0
  5    Chev.      24      2       2750        0
  6    AMC        22      3       2930        0
  7    AMC        17      3       3350        0
  8    Buick      20      3       3250        0
  9    Buick      18      3       3670        0
 10    Buick      20      3       3280        0
 11    Buick      16      3       3880        0
 12    Buick      19      3       3400        0
 13    Cad.       14      3       4330        0
 14    Cad.       21      3       4290        0
 15    Chev.      29      3       2110        0
 16    Chev.      22      3       3180        0
 17    Chev.      19      3       3430        0
 18    Buick      15      4       4080        0
 19    Chev.      16      4       3690        0
 20    Audi       23      3       2070        1
 21    BMW        25      4       2650        1
 22    BMW        25      4       2650        1
 23    Datsun     23      4       2370        1
 24    Datsun     24      4       2280        1
 25    Datsun     21      4       2750        1
 26    Audi       17      5       2830        1
 27    Datsun     35      5       2020        1

In the output above, note how the missing values of rep78 were treated.  Since a missing value is treated as the lowest value possible (e.g., negative infinity), the missing values come before all other values of rep78.

3. Removing duplicates with proc sort

At the beginning of this page, we noted that there was a duplicate observation in auto, that there were two identical records for BMW.  We can use proc sort to remove the duplicate observations from our data file using the noduplicates option, as long as the duplicate observations are next to each other.  The example below sorts the data by foreign and removes the duplicates at the same time.  Note that it did not matter what variable we chose for sorting the data.  As you see in the output below, the extra observation for BMW was deleted.

Obs    make      mpg    rep78    weight    foreign

  1    AMC        22      3       2930        0
  2    AMC        17      3       3350        0
  3    AMC        22      .       2640        0
  4    Buick      20      3       3250        0
  5    Buick      15      4       4080        0
  6    Buick      18      3       3670        0
  7    Buick      26      .       2230        0
  8    Buick      20      3       3280        0
  9    Buick      16      3       3880        0
 10    Buick      19      3       3400        0
 11    Cad.       14      3       4330        0
 12    Cad.       14      2       3900        0
 13    Cad.       21      3       4290        0
 14    Chev.      29      3       2110        0
 15    Chev.      16      4       3690        0
 16    Chev.      22      3       3180        0
 17    Chev.      22      2       3220        0
 18    Chev.      24      2       2750        0
 19    Chev.      19      3       3430        0
 20    Audi       17      5       2830        1
 21    Audi       23      3       2070        1
 22    BMW        25      4       2650        1
 23    Datsun     23      4       2370        1
 24    Datsun     35      5       2020        1
 25    Datsun     24      4       2280        1
 26    Datsun     21      4       2750        1

When you use the noduplicates option, the SAS Log displays a note telling you how many duplicates were removed.  As you see below, SAS informs us that 1 duplicate observation was deleted.

PROC SORT DATA=auto OUT=auto5 NODUPLICATES ;
  BY foreign ;
RUN ;
 
NOTE: 1 duplicate observations were deleted.
NOTE: The data set WORK.AUTO3 has 26 observations and 5 variables.

It is common for duplicate observations to be next to each other in the same file, but if the duplicate observations are not next to each other, there is another strategy you can use to remove the duplicates.  You can sort the data file by all of the variables (which can be indicated with the special keyword _ALL_), which would force the duplicate observations to be next to each other.  This is illustrated below.

PROC SORT DATA=auto OUT=auto6 NODUPLICATES ;
  BY _all_ ;
RUN ;

4. Obtaining separate analyses with sorted data

Sometimes you would like to obtain results separately for different groups.  For example, you might want to get the mean mpg and weight separately for foreign and domestic cars.  As you see below, it is possible to use proc means with the class statement to get these results.

PROC MEANS DATA=auto ;
  CLASS foreign ;
  VAR mpg weight ;
RUN ;  

However, what if you wanted to obtain the correlation of weight and mpg separately for foreign and domestic cars?  Proc corr does not support a class statement like proc means does, but you can use the by statement as in the example below.

PROC SORT DATA=auto OUT=auto6 ;
  BY foreign ;
RUN ;
 
PROC CORR DATA=auto6 ;
  BY foreign ;
  VAR weight mpg ;
RUN ;  

As you see in the output below, using the by statement resulted in getting a proc corr for the domestic cars and a proc corr for the foreign cars.  In general, using the by statement requests that the proc be performed for every level of the by variable (in this case, for every level of foreign).

FOREIGN=0
Correlation Analysis
   2 'VAR' Variables:  WEIGHT   MPG

                                     Simple Statistics

Variable   N          Mean       Std Dev           Sum       Minimum       Maximum
WEIGHT    19   3347.894737    627.176911         63610   2110.000000   4330.000000
MPG       19     19.789474      4.035660    376.000000     14.000000     29.000000

Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 19

                  WEIGHT               MPG
WEIGHT           1.00000          -0.86236
                  0.0               0.0001

MPG             -0.86236           1.00000
                  0.0001            0.0

FOREIGN=1

Correlation Analysis

   2 'VAR' Variables:  WEIGHT   MPG
                                     Simple Statistics

Variable   N          Mean       Std Dev           Sum       Minimum       Maximum
WEIGHT     8   2452.500000    311.436763         19620   2020.000000   2830.000000
MPG        8     24.125000      5.111262    193.000000     17.000000     35.000000

Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 8

                  WEIGHT               MPG
WEIGHT           1.00000          -0.66702
                  0.0               0.0708

MPG             -0.66702           1.00000
                  0.0708            0.0

Here are other examples of where you might use a by statement with the auto data file.  (Note that some of these analyses are not very practical because of the small size of the auto data file, so please imagine that we would be analyzing a larger version of the auto data file.)

5. Problems to look out for

6. For more information

  • For more information about proc sort see the chapter on PROC SORT in the SAS Procedures Guide .
  • 7. Web notes

    How to cite this page

    Report an error on this page or leave a comment

    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.