### SPSS Learning Module Descriptive statistics

NOTE:  The output below was produced using SPSS version 20.

NOTE:  Although commands are show in ALL CAPS, this is not necessary.  We follow the SPSS convention of doing this to make it clear which parts of the syntax are SPSS commands, subcommands or keywords, and which parts are variable names (shown in lower case letters).  SPSS is not case sensitive, so use whichever case is easiest for you.

#### 1. Introduction

This module demonstrates how to obtain basic descriptive statistics using SPSS.  We will use a data file containing data on 26 automobiles with their make, price, mpg, repair record, and whether the car was foreign or domestic. The data file is presented below.

MAKE PRICE MPG REP78 FOREIGN
AMC    4099 22 3 0
AMC    4749 17 3 0
AMC    3799 22 3 0
Audi   9690 17 5 1
Audi   6295 23 3 1
BMW    9735 25 4 1
Buick  4816 20 3 0
Buick  7827 15 4 0
Buick  5788 18 3 0
Buick  4453 26 3 0
Buick  5189 20 3 0
Buick 10372 16 3 0
Buick  4082 19 3 0
Chev.  3299 29 3 0
Chev.  5705 16 4 0
Chev.  4504 22 3 0
Chev.  5104 22 2 0
Chev.  3667 24 2 0
Chev.  3955 19 3 0
Datsun 6229 23 4 1
Datsun 4589 35 5 1
Datsun 5079 24 4 1
Datsun 8129 21 4 1

The program below reads the data and creates a temporary SPSS .sav file.  The descriptive statistics shown in this module are all performed on this .sav file. The list of variables on the data list command is make (A8) price  mpg  rep78  foreign  . The (A8) following make indicates that make is a character variable.  The word free indicates "free field" input.

DATA LIST FREE/
make (A8) price mpg  rep78 foreign .
BEGIN DATA.
AMC    4099 22 3 0
AMC    4749 17 3 0
AMC    3799 22 3 0
Audi   9690 17 5 1
Audi   6295 23 3 1
BMW    9735 25 4 1
Buick  4816 20 3 0
Buick  7827 15 4 0
Buick  5788 18 3 0
Buick  4453 26 3 0
Buick  5189 20 3 0
Buick 10372 16 3 0
Buick  4082 19 3 0
Chev.  3299 29 3 0
Chev.  5705 16 4 0
Chev.  4504 22 3 0
Chev.  5104 22 2 0
Chev.  3667 24 2 0
Chev.  3955 19 3 0
Datsun 6229 23 4 1
Datsun 4589 35 5 1
Datsun 5079 24 4 1
Datsun 8129 21 4 1
END DATA.
EXECUTE.

LIST
/CASES=10.
EXECUTE.

The output of the list command is shown below.  You can compare the program to the output below.

make        price      mpg    rep78  foreign

AMC       4099.00    22.00     3.00      .00
AMC       4749.00    17.00     3.00      .00
AMC       3799.00    22.00     3.00      .00
Audi      9690.00    17.00     5.00     1.00
Audi      6295.00    23.00     3.00     1.00
BMW       9735.00    25.00     4.00     1.00
Buick     4816.00    20.00     3.00      .00
Buick     7827.00    15.00     4.00      .00
Buick     5788.00    18.00     3.00      .00
Buick     4453.00    26.00     3.00      .00

Number of cases read:  10    Number of cases listed:  10

#### 2. Using the frequencies or crosstabs command for counts

Both of these commands are used for obtaining information on the number of cases that have a certain characteristic.

Frequencies

This command is used to obtain counts on a single variable's values.

Crosstabs

This command is used to obtain counts on more than one variable's values.  For example, to obtain counts on foreign cars with good repair record, and domestic cars with poor repair records.

We can use frequencies to produce tables of counts for individual variables.  Below, we use it to make frequency tables for make, rep78 and foreign.  Since any command name can be abbreviated to three characters if those three characters are unique to that command, the frequencies can be abbreviated freq.  The var subcommand is on a separate line and preceded by a  forward slash ( / ).  Subcommands may be placed on the same line as the command name.  The first subcommand does not have to be preceded by a slash, but doing so forms a good habit.

FREQ
/VARIABLES= make.

FREQ
/VARIABLES= rep78.

FREQ
/VAR= foreign.

Here is the output produced by the frequencies commands above.

Instead of having three separate frequencies, we could have done this all in one step as illustrated below.

FREQ
/VARIABLES= make rep78 foreign.

Let's use crosstabs to look at a cross tabulation of the repair history of the cars (rep78) for foreign and domestic cars (foreign).  The crosstabs command for this is shown below.

CROSSTABS
/TABLES=rep78 BY foreign.

This is the output produced.

We can also show more information by using the count, row, column and total specifications on the cell subcommand to request the printing of the row percentages, column percentages and total percentage along with the count.  Note that these specifications come after the = on the cell subcommand.  Generally, the form is "subcommand=specifications list".  Subcommands are preceded by a forward slash ( / ).

CROSSTABS
/TABLES=rep78 BY foreign
/CELLS= COUNT ROW COLUMN TOTAL.

The output is shown below.

Note: The order of the options does not matter.  We would have gotten the same output had we written the command like this:

CROSSTABS
/TABLES=rep78 BY foreign
/CELLS= TOTAL COUNT ROW COLUMN.

#### 3. Using the descriptives or means command for summary statistics

Both of these procedures are used for obtaining descriptive statistics like means and standard deviations.

Descriptives

This command is used to obtain descriptive statistics on a single variable.

Means

This command is used to obtain descriptive statistics on a variable at different levels of another variable.  For example, to obtain mean mpg separately for foreign cars and domestic cars.

To produce summary statistics, descriptives can be used.  Below, descriptives is used to get descriptive statistics for the variable mpg.

DESCRIPTIVES
/VARIABLES=mpg.

The results of the descriptives are shown below.

Suppose we would like to get the summary statistics separately for foreign and domestic cars (indicated by the variable foreign).   We can use the means command and list foreign after the keyword by on the tables subcommand.  The example below will produce separate results for the different values of foreign.


MEANS
/TABLES=mpg BY foreign.

The results are presented separately for the 7 foreign cars (when foreign equals 1) and the 19 domestic cars (when foreign equals 0):

#### 4. Using the examine command for detailed summary statistics

You can use examine to get more detailed summary statistics including median, variance and interquartile range, as well as descriptive plots.

EXAMINE
/VARIABLES=mpg.

Below are the results of the examine command.

#### 5. Problems to look out for

• If you make a cross tabulation table with crosstabs and one of the variables has large number of values (say 10 or more), the crosstab table could be very hard to read.
• When using the keyword by in examine, if you choose a by variable with a large number of values (say 5, 10, or more) it will produce a very large amount of output.  In such cases, you may try to use the means command with a by keyword on the tables subcommand instead.