|
|
|
||||
|
|
|||||
This module demonstrates how to obtain basic descriptive statistics using SPSS. The output presented is the output you would get running the program in batch in SPSS 6.1 on an AIX machine. If you run the same program on a PC under Windows with SPSS 7.5 or higher the output of tables will be publication quality. This output, though it will have a different look, will contain the same information. We will use a data file containing data on 26 automobiles with their make, price, mpg, repair record, and whether the car was foreign or domestic. The data file is presented below .
MAKE PRICE MPG REP78 FOREIGN AMC 4099 22 3 0 AMC 4749 17 3 0 AMC 3799 22 3 0 Audi 9690 17 5 1 Audi 6295 23 3 1 BMW 9735 25 4 1 Buick 4816 20 3 0 Buick 7827 15 4 0 Buick 5788 18 3 0 Buick 4453 26 3 0 Buick 5189 20 3 0 Buick 10372 16 3 0 Buick 4082 19 3 0 Cad. 11385 14 3 0 Cad. 14500 14 2 0 Cad. 15906 21 3 0 Chev. 3299 29 3 0 Chev. 5705 16 4 0 Chev. 4504 22 3 0 Chev. 5104 22 2 0 Chev. 3667 24 2 0 Chev. 3955 19 3 0 Datsun 6229 23 4 1 Datsun 4589 35 5 1 Datsun 5079 24 4 1 Datsun 8129 21 4 1
The program below reads the data and creates a temporary SPSS .sav file. The descriptive statistics shown in this module are all performed on this save file. The list of variables on the DATA LIST command is make (A8) price * mpg * rep78 * foreign * . The (A8) following make indicates that make is a character variable. The * following the each of the other variables means that they are numeric variables. These are used with FREE which indicates "free field" input.
DATA LIST FREE/ make (A8) price * mpg * rep78 * foreign * . BEGIN DATA. AMC 4099 22 3 0 AMC 4749 17 3 0 AMC 3799 22 3 0 Audi 9690 17 5 1 Audi 6295 23 3 1 BMW 9735 25 4 1 Buick 4816 20 3 0 Buick 7827 15 4 0 Buick 5788 18 3 0 Buick 4453 26 3 0 Buick 5189 20 3 0 Buick 10372 16 3 0 Buick 4082 19 3 0 Cad. 11385 14 3 0 Cad. 14500 14 2 0 Cad. 15906 21 3 0 Chev. 3299 29 3 0 Chev. 5705 16 4 0 Chev. 4504 22 3 0 Chev. 5104 22 2 0 Chev. 3667 24 2 0 Chev. 3955 19 3 0 Datsun 6229 23 4 1 Datsun 4589 35 5 1 Datsun 5079 24 4 1 Datsun 8129 21 4 1 END DATA. EXECUTE. LIST /CASES=10. EXECUTE.
The output of the LIST command is shown below. You can compare the program to the output below.
MAKE PRICE MPG REP78 FOREIGN AMC 4099.00 22.00 3.00 .00 AMC 4749.00 17.00 3.00 .00 AMC 3799.00 22.00 3.00 .00 Audi 9690.00 17.00 5.00 1.00 Audi 6295.00 23.00 3.00 1.00 BMW 9735.00 25.00 4.00 1.00 Buick 4816.00 20.00 3.00 .00 Buick 7827.00 15.00 4.00 .00 Buick 5788.00 18.00 3.00 .00 Buick 4453.00 26.00 3.00 .00
Both of these commands are used for obtaining information on the number of cases that have a certain characteristic.
FREQUENCIES
This command is used to obtain counts on a single variable's values.
CROSSTABS
This command is used to obtain counts on more than one variable's values. For example, to obtain counts on foreign cars with good repair record, and domestic cars with poor repair records.
We can use FREQUENCIES to produce tables of counts for individual variables. Below, we use it to make frequency tables for make, rep78 and foreign. Since any command name can be abbreviated to three characters if those three characters are unique to that command, then FREQUENCIES can be abbreviated FREQ. The VAR subcommand is on a separate line and preceded by a slash ( / ). Subcommands may be placed on the same line as the command name. The first subcommand does not have to be preceded by a slash, but doing so forms a good habits.FREQ /VAR= make. FREQ /VAR= rep78. FREQ /VAR= foreign.
Here is the output produced by the FREQUENCIES commands above.
MAKE
Valid Cum
Value Label Value Frequency Percent Percent Percent
AMC 3 11.5 11.5 11.5
Audi 2 7.7 7.7 19.2
BMW 1 3.8 3.8 23.1
Buick 7 26.9 26.9 50.0
Cad. 3 11.5 11.5 61.5
Chev. 6 23.1 23.1 84.6
Datsun 4 15.4 15.4 100.0
------- ------- -------
Total 26 100.0 100.0
Valid cases 26 Missing cases 0
REP78
Valid Cum
Value Label Value Frequency Percent Percent Percent
2.00 3 11.5 11.5 11.5
3.00 15 57.7 57.7 69.2
4.00 6 23.1 23.1 92.3
5.00 2 7.7 7.7 100.0
------- ------- -------
Total 26 100.0 100.0
Valid cases 26 Missing cases 0
FOREIGN
Valid Cum
Value Label Value Frequency Percent Percent Percent
.00 19 73.1 73.1 73.1
1.00 7 26.9 26.9 100.0
------- ------- -------
Total 26 100.0 100.0
Valid cases 26 Missing cases 0
Instead of having three separate FREQUENCIES, we could have done this all in one FREQUENCIES step as illustrated below.
FREQ /VAR= make rep78 foreign .
Let's use CROSSTABS to look at a cross tabulation of the repair history of the cars (rep78) for foreign and domestic cars (foreign). The CROSSTABS command for this is shown below.
CROSSTABS /TABLES=rep78 BY foreign.
This is the output produced.
REP78 by FOREIGN
FOREIGN Page 1 of 1
Count |
|
| Row
| .00| 1.00| Total
REP78 --------+--------+--------+
2.00 | 3 | | 3
| | | 11.5
+--------+--------+
3.00 | 14 | 1 | 15
| | | 57.7
+--------+--------+
4.00 | 2 | 4 | 6
| | | 23.1
+--------+--------+
5.00 | | 2 | 2
| | | 7.7
+--------+--------+
Column 19 7 26
Total 73.1 26.9 100.0
Number of Missing Observations: 0
We can also show the cell percentages to provide more information by using the COUNT, ROW, COLUMN and TOTAL specifications on the CELL subcommand to request the printing of the row percentages, column percentages and total percentage along with the count. Note that the specifications come after the = on the CELL subcommand. Generally the form is "subcommand=specifications list". Subcommands are preceded by a / (slash).
CROSSTABS /TABLES=rep78 BY foreign /CELLS= COUNT ROW COLUMN TOTAL .
The output is shown below.
REP78 by FOREIGN
FOREIGN Page 1 of 1
Count |
Row Pct |
Col Pct | Row
Tot Pct | .00| 1.00| Total
REP78 --------+--------+--------+
2.00 | 3 | | 3
| 100.0 | | 11.5
| 15.8 | |
| 11.5 | |
+--------+--------+
3.00 | 14 | 1 | 15
| 93.3 | 6.7 | 57.7
| 73.7 | 14.3 |
| 53.8 | 3.8 |
+--------+--------+
4.00 | 2 | 4 | 6
| 33.3 | 66.7 | 23.1
| 10.5 | 57.1 |
| 7.7 | 15.4 |
+--------+--------+
5.00 | | 2 | 2
| | 100.0 | 7.7
| | 28.6 |
| | 7.7 |
+--------+--------+
Column 19 7 26
Total 73.1 26.9 100.0
Number of Missing Observations: 0
The order of the options does not matter. We would have gotten the same output had we written the command like this...
CROSSTABS /TABLES=rep78 BY foreign /CELLS= TOTAL COUNT ROW COLUMN .
Both of these procedures are used for obtaining descriptive statistics like means and standard deviations.
DESCRIPTIVES
This command is used to obtain descriptive statistics on a single variable.
MEAN
To produce summary statistics, DESCRIPTIVES can be used. Below, DESCRIPTIVES is used to get descriptive statistics for the variable mpg.
DESCRIPTIVES /VAR=mpg .
The results of the DESCRIPTIVES are shown below.
Number of valid observations (listwise) = 26.00
Valid
Variable Mean Std Dev Minimum Maximum N Label
MPG 20.92 4.76 14.00 35.00 26
Suppose we would like to get the summary statistics separately for foreign and domestic cars (indicated by the variable foreign). We can use the MEANS command and list foreign after the keyword BY on the TABLES subcommand. The example below will produce separate results for the different values of foreign.
MEANS /TABLES=mpg BY foreign .
As you see below, the results are presented separately for the 7 foreign cars (foreign equals 1) and the 19 domestic cars (when foreign is 0).
- - Description of Subpopulations - - Summaries of MPG By levels of FOREIGN Variable Value Label Mean Std Dev Cases For Entire Population 20.9231 4.7575 26 FOREIGN .00 19.7895 4.0357 19 FOREIGN 1.00 24.0000 5.5076 7 Total Cases = 26
You can use EXAMINE to get more detailed summary statistics. I assume that beginners using SPSS syntax mode will most probably be using telnet to a UNIX machine. Since you may not have access to X-Windows, you will likely be restricted to low resolution graphics. Therefore, the output from running the program below will be presented as low resolution graphics. Thus if you are running in batch on UNIX you should include the following command in your program.
SET HIGHRES=OFF .
DO NOT include this command if you are running on the PC or have access to SPSS Dialog Box mode on UNIX.
EXAMINE /VARIABLES=mpg .
And here are the results of the EXAMINE.
MPG
Valid cases: 26.0 Missing cases: .0 Percent missing: .0
Mean 20.9231 Std Err .9330 Min 14.00 Skewness .9355
Median 21.0000 Variance 22.6338 Max 35.00 S E Skew .4556
5% Trim 20.6026 Std Dev 4.7575 Range 21.00 Kurtosis 1.7927
95% CI for Mean (19.0015, 22.8447) IQR 6.25 S E Kurt .8865
Frequency Stem & Leaf
.00 1 t
3.00 1 f 445
4.00 1 s 6677
3.00 1 . 899
4.00 2 * 0011
6.00 2 t 222233
3.00 2 f 445
1.00 2 s 6
1.00 2 . 9
1.00 Extremes (35)
Stem width: 10.00
Each leaf: 1 case(s)
|
| (O) Case: 24
|
|
|
|
|
30 +
| --+--
| |
| |
| |
| |
| |
| |
| +-+-+
| | |
| | * |
20 + | |
| | |
| | |
| +-+-+
| |
| |
| |
| --+--
|
10 +
|
|
+------------------------------------------------
Variable MPG
N of Cases 26.00
To obtain separate EXAMINE results for foreign and domestic cars, all you have to do is add BY foreign to the VARIABLES subcommand. This will work with some but not all SPSS commands.
EXAMINE /VARIABLES=mpg BY foreign.
As you see in the output below, you get a complete set of output for overall mpg. This is followed by complete output for the case where foreign is 0, and then another set of output when foreign is 1. Finally you get side-by-side Box Plots for each level of foreign.
MPG
Valid cases: 26.0 Missing cases: .0 Percent missing: .0
Mean 20.9231 Std Err .9330 Min 14.00 Skewness .9355
Median 21.0000 Variance 22.6338 Max 35.00 S E Skew .4556
5% Trim 20.6026 Std Dev 4.7575 Range 21.00 Kurtosis 1.7927
95% CI for Mean (19.0015, 22.8447) IQR 6.25 S E Kurt .8865
Frequency Stem & Leaf
.00 1 t
3.00 1 f 445
4.00 1 s 6677
3.00 1 . 899
4.00 2 * 0011
6.00 2 t 222233
3.00 2 f 445
1.00 2 s 6
1.00 2 . 9
1.00 Extremes (35)
Stem width: 10.00
Each leaf: 1 case(s)
|
|
| (O) Case: 24
|
|
|
|
|
30 +
| --+--
| |
| |
| |
| |
| |
| |
| +-+-+
| | |
| | * |
20 + | |
| | |
| | |
| +-+-+
| |
| |
| |
| --+--
|
10 +
|
|
+--------------------------------------------------------------------
Variable MPG
N of Cases 26.00
Symbol Key: * - Median (O) - Outlier (E) - Extreme
MPG
By FOREIGN .00
Valid cases: 19.0 Missing cases: .0 Percent missing: .0
Mean 19.7895 Std Err .9258 Min 14.00 Skewness .4774
Median 20.0000 Variance 16.2865 Max 29.00 S E Skew .5238
5% Trim 19.5994 Std Dev 4.0357 Range 15.00 Kurtosis .0412
95% CI for Mean (17.8443, 21.7346) IQR 6.00 S E Kurt 1.0143
Frequency Stem & Leaf
2.00 1 * 44
7.00 1 . 5667899
8.00 2 * 00122224
2.00 2 . 69
Stem width: 10.00
Each leaf: 1 case(s)
MPG
By FOREIGN 1.00
Valid cases: 7.0 Missing cases: .0 Percent missing: .0
Mean 24.0000 Std Err 2.0817 Min 17.000 Skewness 1.3408
Median 23.0000 Variance 30.3333 Max 35.000 S E Skew .7937
5% Trim 23.7778 Std Dev 5.5076 Range 18.000 Kurtosis 3.2861
95% CI for Mean (18.9063, 29.0937) IQR 4.000 S E Kurt 1.5875
Frequency Stem & Leaf
1.00 Extremes (17)
4.00 2 * 1334
1.00 2 . 5
1.00 Extremes (35)
Stem width: 10.00
Each leaf: 1 case(s)
|
| (E) Case: 24
|
|
|
|
|
30 +
| --+--
M | |
P | |
G | |
| | --+--
| | +-+-+
| | | |
| | | * |
| +-+-+ +-+-+
| | | --+--
20 + | * |
| | |
| | |
| | | (O) Case: 4
| +-+-+
| |
| |
| --+--
|
10 +
|
|
+--------------------------------------------------------------------
FOREIGN .00 1.00
N of Cases 19.00 7.00
Symbol Key: * - Median (O) - Outlier (E) - Extreme
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services