UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SPSS Learning Module
Using SORT and SPLIT BY

1. Introduction

This module will examine the use of SORT and use of the SPLIT BY statement with SPSS commands. We will illustrate this with the data file shown below.

DATA LIST / make 1-7 (A) mpg 9-10 rep78 12 weight 14-17 foreign 19 .
BEGIN DATA.
AMC     22 3 2930 0
AMC     17 3 3350 0
AMC     22   2640 0
Audi    17 5 2830 1
Audi    23 3 2070 1
BMW     25 4 2650 1
Buick   20 3 3250 0
Buick   15 4 4080 0
Buick   18 3 3670 0
Buick   26   2230 0
Buick   20 3 3280 0
Buick   16 3 3880 0
Buick   19 3 3400 0
Cad.    14 3 4330 0
Cad.    14 2 3900 0
Cad.    21 3 4290 0
Chev.   29 3 2110 0
Chev.   16 4 3690 0
Chev.   22 3 3180 0
Chev.   22 2 3220 0
Chev.   24 2 2750 0
Chev.   19 3 3430 0
Datsun  23 4 2370 1
Datsun  35 5 2020 1
Datsun  24 4 2280 1
Datsun  21 4 2750 1
END DATA.

LIST.

The output from the LIST command is shown below.

MAKE    MPG REP78 WEIGHT FOREIGN
AMC      22   3    2930     0
AMC      17   3    3350     0
AMC      22   .    2640     0
Audi     17   5    2830     1
Audi     23   3    2070     1
BMW      25   4    2650     1
Buick    20   3    3250     0
Buick    15   4    4080     0
Buick    18   3    3670     0
Buick    26   .    2230     0
Buick    20   3    3280     0
Buick    16   3    3880     0
Buick    19   3    3400     0
Cad.     14   3    4330     0
Cad.     14   2    3900     0
Cad.     21   3    4290     0
Chev.    29   3    2110     0
Chev.    16   4    3690     0
Chev.    22   3    3180     0
Chev.    22   2    3220     0
Chev.    24   2    2750     0
Chev.    19   3    3430     0
Datsun   23   4    2370     1
Datsun   35   5    2020     1
Datsun   24   4    2280     1
Datsun   21   4    2750     1

2. Sorting data with SORT

We can use SORT to sort this data file. The program below sorts the file on the variable foreign (1=foreign car, 0=domestic car).

SORT CASES BY foreign.
LIST.

From the LIST below, you can that the data are indeed sorted on foreign. The observations where foreign is 0 precede all of the observations where foreign is 1.  Note that the order of the observations within each group remain unchanged, (i.e., the observations where foreign is 0 remain in the same order).

MAKE    MPG REP78 WEIGHT FOREIGN
AMC      22   3    2930     0
AMC      17   3    3350     0
AMC      22   .    2640     0
Buick    20   3    3250     0
Buick    15   4    4080     0
Buick    18   3    3670     0
Buick    26   .    2230     0
Buick    20   3    3280     0
Buick    16   3    3880     0
Buick    19   3    3400     0
Cad.     14   3    4330     0
Cad.     14   2    3900     0
Cad.     21   3    4290     0
Chev.    29   3    2110     0
Chev.    16   4    3690     0
Chev.    22   3    3180     0
Chev.    22   2    3220     0
Chev.    24   2    2750     0
Chev.    19   3    3430     0
Audi     17   5    2830     1
Audi     23   3    2070     1
BMW      25   4    2650     1
Datsun   23   4    2370     1
Datsun   35   5    2020     1
Datsun   24   4    2280     1
Datsun   21   4    2750     1

Suppose you wanted the data sorted, but with the foreign cars (foreign=1) first and the domestic cars (foreign=0) second. The example below shows the (D) option that tells SPSS to sort the data in descending order (for the variable it precedes).  In the example below, the data are sorted on foreign, but the order is reversed with the values going from largest to smallest.

SORT CASES BY foreign (D).
LIST .

You can see from the output of the LIST command below that the data are now ordered by foreign, but highest to lowest.

MAKE    MPG REP78 WEIGHT FOREIGN
Audi     17   5    2830     1
Audi     23   3    2070     1
BMW      25   4    2650     1
Datsun   23   4    2370     1
Datsun   35   5    2020     1
Datsun   24   4    2280     1
Datsun   21   4    2750     1
AMC      22   3    2930     0
AMC      17   3    3350     0
AMC      22   .    2640     0
Buick    20   3    3250     0
Buick    15   4    4080     0
Buick    18   3    3670     0
Buick    26   .    2230     0
Buick    20   3    3280     0
Buick    16   3    3880     0
Buick    19   3    3400     0
Cad.     14   3    4330     0
Cad.     14   2    3900     0
Cad.     21   3    4290     0
Chev.    29   3    2110     0
Chev.    16   4    3690     0
Chev.    22   3    3180     0
Chev.    22   2    3220     0
Chev.    24   2    2750     0
Chev.    19   3    3430     0

It is also possible to sort on more than one variable at a time. Perhaps you would like the data sorted on foreign (this time we will go back to the normal sort order for foreign) and then sorted by rep78 within each level of foreign. The example below shows how this can be done.

SORT CASES BY foreign rep78.
LIST .

The output of the LIST command below shows that the data are now ordered by foreign, domestic cars (foreign=0) followed by foreign (foreign=1) cars. Within the domestic cars, the data are sorted by rep78 and within foreign cars the data are also sorted by rep78.

MAKE    MPG REP78 WEIGHT FOREIGN
AMC      22   .    2640     0
Buick    26   .    2230     0
Cad.     14   2    3900     0
Chev.    22   2    3220     0
Chev.    24   2    2750     0
AMC      22   3    2930     0
AMC      17   3    3350     0
Buick    20   3    3250     0
Buick    18   3    3670     0
Buick    20   3    3280     0
Buick    16   3    3880     0
Buick    19   3    3400     0
Cad.     14   3    4330     0
Cad.     21   3    4290     0
Chev.    29   3    2110     0
Chev.    22   3    3180     0
Chev.    19   3    3430     0
Buick    15   4    4080     0
Chev.    16   4    3690     0
Audi     23   3    2070     1
BMW      25   4    2650     1
Datsun   23   4    2370     1
Datsun   24   4    2280     1
Datsun   21   4    2750     1
Audi     17   5    2830     1
Datsun   35   5    2020     1

In the output above, note how the missing values of rep78 were treated. When sorting the data, missing values are treated as the lowest value possible (e.g., negative infinity) so the missing values come before all other values of rep78.

3. Obtaining separate analyses with sorted data

Sometimes you would like to obtain results separately for different groups. For example, you might want to get the mean mpg and weight separately for foreign and domestic cars, as illustrated below.

MEANS weight BY foreign.

As you see below, it is possible to use MEANS with the BY option to get means separately for the foreign and domestic cars.

                 - - Description of Subpopulations - -
Summaries of     WEIGHT
By levels of     FOREIGN

Variable      Value  Label    Mean        Std Dev     Cases
For Entire Population         3099.2308   695.0794    26

FOREIGN           0           3347.8947   627.1769    19
FOREIGN           1           2424.2857   325.1593     7

Total Cases = 26

However, what if you wanted to obtain the correlation of weight and mpg separately for foreign and domestic cars? The CORRELATIONS command does not support a BY option like MEANS does.  In such cases, you can SORT the data and then use SPLIT BY to obtain separate analyses, as illustrated below.

SORT CASES BY foreign.
SPLIT FILE BY foreign.
CORRELATIONS weight mpg.

As you see in the output below, using the SPLIT FILE BY foreign resulted in getting a CORRELATIONS for the domestic cars and a CORRELATIONS for the foreign cars. In general, using the SORT and SPLIT FILE BY commands request that subsequent commands be performed for every level of the BY variable (in this case, for every level of foreign).

FOREIGN:  0
      - -  Correlation Coefficients  - -

             WEIGHT     MPG
WEIGHT       1.0000     -.8624
            (   19)    (   19)
            P= .       P= .000

MPG          -.8624     1.0000
            (   19)    (   19)
            P= .000    P= .

(Coefficient / (Cases) / 2-tailed Significance)
" . " is printed if a coefficient cannot be computed

FOREIGN:  1

             WEIGHT     MPG
WEIGHT       1.0000     -.7101
            (    7)    (    7)
            P= .       P= .074

MPG          -.7101     1.0000
            (    7)    (    7)
            P= .074    P= .

(Coefficient / (Cases) / 2-tailed Significance)
" . " is printed if a coefficient cannot be computed

The SPLIT FILE BY remains in effect until you turn it off (by typing SPLIT FILE OFF).  For example, let's do a DESCRIPTIVES on weight and mpg to illustrate this.

DESCRIPTIVES weight mpg.

As we expected, we are shown DESCRIPTIVES for foreign and domestic cars.

FOREIGN:  0
Number of valid observations (listwise) =        19.00
                                                   Valid
Variable      Mean    Std Dev   Minimum   Maximum      N  Label
WEIGHT     3347.89     627.18      2110      4330     19
MPG          19.79       4.04        14        29     19

FOREIGN:  1
Number of valid observations (listwise) =         7.00
                                                   Valid
Variable      Mean    Std Dev   Minimum   Maximum      N  Label
WEIGHT     2424.29     325.16      2020      2830      7
MPG          24.00       5.51        17        35      7

Now, let's enter SPLIT FILE OFF and repeat the DESCRIPTIVES on weight and mpg to confirm that this ends the SPLIT FILE.

SPLIT FILE OFF.
DESCRIPTIVES weight mpg.

As we would expect, we are shown DESCRIPTIVES for the overall sample.

Number of valid observations (listwise) =        26.00
                                                   Valid
Variable      Mean    Std Dev   Minimum   Maximum      N  Label
WEIGHT     3099.23     695.08      2020      4330     26
MPG          20.92       4.76        14        35     26

4. Problems to look out for

5. For more information


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California