SPSS FAQ
How can I analyze my data by categories?
Sometimes you may want to analyze your data based on
categories or a grouping variable. One way that you could do this is to split the data file into different data files and
conduct the same analyses on the two (or more) data sets. However, that is cumbersome and error prone. Several
commands in SPSS will allow you to do separate analyses by category, and we will consider them below.
Let's use the example data set below. You will notice that one
of the independent variables, iv1, is a string variable. We will use this variable as our grouping variable to demonstrate how
to use a string variable as the grouping variable. All of the techniques that will be shown can be used with a numeric
categorical variable as well.
data list list / sub * iv1 (A) iv2 * dv1 dv2.
begin data
1 "1" 1 48 25
2 "1" 1 49 37
3 "1" 1 50 55
4 "2" 1 17 19
5 "2" 1 20 38
6 "2" 2 23 48
7 "2" 2 28 44
8 "3" 2 28 68
9 "3" 2 30 30
10 "3" 2 32 37
end data.
To begin with, suppose we wanted to find the mean and standard
deviation for dv1 for groups one, two and three in iv1. We can use
the means command to obtain simple descriptive statistics.
means tables= dv1 by iv1.
Case Processing Summary
|
Cases |
| Included |
Excluded |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| DV1 * IV1 |
10 |
100.0% |
0 |
.0% |
10 |
100.0% |
Report
DV1
| IV1 |
Mean |
N |
Std. Deviation |
| 1 |
49.0000 |
3 |
1.00000 |
| 2 |
22.0000 |
4 |
4.69042 |
| 3 |
30.0000 |
3 |
2.00000 |
| Total |
32.5000 |
10 |
12.25878 |
You could also use the examine command, as shown below. We will use the
/plot = none statement to suppress the stem-and-leaf and boxplots.
examine dv1 by iv1
/plot = none.
Case Processing Summary
|
Cases |
| Valid |
Missing |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| DV1 |
10 |
100.0% |
0 |
.0% |
10 |
100.0% |
Descriptives
|
Statistic |
Std. Error |
| DV1 |
Mean |
32.5000 |
3.87657 |
| 95% Confidence Interval for Mean
|
Lower Bound |
23.7306 |
|
| Upper Bound |
41.2694 |
|
| 5% Trimmed Mean |
32.3889 |
|
| Median |
29.0000 |
|
| Variance |
150.278 |
|
| Std. Deviation |
12.25878 |
|
| Minimum |
17.00 |
|
| Maximum |
50.00 |
|
| Range |
33.00 |
|
| Interquartile Range |
26.0000 |
|
| Skewness |
.516 |
.687 |
| Kurtosis |
-1.278 |
1.334 |
Case Processing Summary
|
Cases |
| Valid |
Missing |
Total |
|
IV1 |
N |
Percent |
N |
Percent |
N |
Percent |
| DV1 |
1 |
3 |
100.0% |
0 |
.0% |
3 |
100.0% |
| 2 |
4 |
100.0% |
0 |
.0% |
4 |
100.0% |
| 3 |
3 |
100.0% |
0 |
.0% |
3 |
100.0% |
Descriptives
|
IV1 |
Statistic |
Std. Error |
| DV1 |
1 |
Mean |
49.0000 |
.57735 |
| 95% Confidence Interval for Mean
|
Lower Bound |
46.5159 |
|
| Upper Bound |
51.4841 |
|
| 5% Trimmed Mean |
. |
|
| Median |
49.0000 |
|
| Variance |
1.000 |
|
| Std. Deviation |
1.00000 |
|
| Minimum |
48.00 |
|
| Maximum |
50.00 |
|
| Range |
2.00 |
|
| Interquartile Range |
. |
|
| Skewness |
.000 |
1.225 |
| Kurtosis |
. |
. |
| 2 |
Mean |
22.0000 |
2.34521 |
| 95% Confidence Interval for Mean
|
Lower Bound |
14.5365 |
|
| Upper Bound |
29.4635 |
|
| 5% Trimmed Mean |
21.9444 |
|
| Median |
21.5000 |
|
| Variance |
22.000 |
|
| Std. Deviation |
4.69042 |
|
| Minimum |
17.00 |
|
| Maximum |
28.00 |
|
| Range |
11.00 |
|
| Interquartile Range |
9.0000 |
|
| Skewness |
.543 |
1.014 |
| Kurtosis |
-.153 |
2.619 |
| 3 |
Mean |
30.0000 |
1.15470 |
| 95% Confidence Interval for Mean
|
Lower Bound |
25.0317 |
|
| Upper Bound |
34.9683 |
|
| 5% Trimmed Mean |
. |
|
| Median |
30.0000 |
|
| Variance |
4.000 |
|
| Std. Deviation |
2.00000 |
|
| Minimum |
28.00 |
|
| Maximum |
32.00 |
|
| Range |
4.00 |
|
| Interquartile Range |
. |
|
| Skewness |
.000 |
1.225 |
| Kurtosis |
. |
. |
Now let's a technique that is more general and that can be
used with any type of analysis. First, we need to sort the data by by our grouping variable, in this case,
iv1. Then we split the file by the same variable. The split file command temporarily splits the file by the variable
specified. All analyses will be grouped by this variable until the split file off command is issued, or until the data are resorted. Note that the
split file command can be used with numeric, short and long string variables. (Many SPSS commands will not work
with long string variables, but split file will.) Next, list the commands for the analyses that you would like. Finally, issue the
split file off command.
sort cases by iv1.
split file by iv1.
correlations var = dv1 with dv2.
Correlations
| IV1 |
DV2 |
| 1 |
DV1 |
Pearson Correlation |
.993 |
| Sig. (2-tailed) |
.073 |
| N |
3 |
| 2 |
DV1 |
Pearson Correlation |
.780 |
| Sig. (2-tailed) |
.220 |
| N |
4 |
| 3 |
DV1 |
Pearson Correlation |
-.766 |
| Sig. (2-tailed) |
.444 |
| N |
3 |
split file off.
Note that you can use more than one variable to categorize your
analysis. To do so, list all of the variables by which you want the analysis categorized in the
sort cases command and in the split file command.
sort cases by iv1 iv2.
split file by iv1 iv2.
correlations var = dv1 with dv2.
Correlations
| IV1 |
IV2 |
DV2 |
| 1 |
1.00 |
DV1 |
Pearson Correlation |
.993 |
| Sig. (2-tailed) |
.073 |
| N |
3 |
| 2 |
1.00 |
DV1 |
Pearson Correlation |
1.000 |
| Sig. (2-tailed) |
. |
| N |
2 |
| 2.00 |
DV1 |
Pearson Correlation |
-1.000 |
| Sig. (2-tailed) |
. |
| N |
2 |
| 3 |
2.00 |
DV1 |
Pearson Correlation |
-.766 |
| Sig. (2-tailed) |
.444 |
| N |
3 |
split file off.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California