SPSS FAQ
How can I analyze my data by categories?

Sometimes you may want to analyze your data based on categories or a grouping variable.  One way that you could do this is to split the data file into different data files and conduct the same analyses on the two (or more) data sets.  However, that is cumbersome and error prone.  Several commands in SPSS will allow you to do separate analyses by category, and we will consider them below.

Let's use the example data set below.  You will notice that one of the independent variables, iv1, is a string variable.  We will use this variable as our grouping variable to demonstrate how to use a string variable as the grouping variable.  All of the techniques that will be shown can be used with a numeric categorical variable as well.

data list list / sub * iv1 (A)  iv2 * dv1 dv2.
begin data
1 "1" 1 48 25
2 "1" 1 49 37
3 "1" 1 50 55
4 "2" 1 17 19
5 "2" 1 20 38
6 "2" 2 23 48
7 "2" 2 28 44
8 "3" 2 28 68
9 "3" 2 30 30
10 "3" 2 32 37
end data.

To begin with, suppose we wanted to find the mean and standard deviation for dv1 for groups one, two and three in iv1.  We can use the means command to obtain simple descriptive statistics.

means tables= dv1 by iv1.
Case Processing Summary

Cases
Included Excluded Total
N Percent N Percent N Percent
DV1 * IV1 10 100.0% 0 .0% 10 100.0%
Report
DV1
IV1 Mean N Std. Deviation
1 49.0000 3 1.00000
2 22.0000 4 4.69042
3 30.0000 3 2.00000
Total 32.5000 10 12.25878

You could also use the examine command, as shown below.  We will use the plot = none subcommand to suppress the stem-and-leaf and boxplots.

examine dv1 by iv1
 /plot = none.
Case Processing Summary

Cases
Valid Missing Total
N Percent N Percent N Percent
DV1 10 100.0% 0 .0% 10 100.0%
Descriptives

Statistic Std. Error
DV1 Mean 32.5000 3.87657
95% Confidence Interval for Mean Lower Bound 23.7306
Upper Bound 41.2694
5% Trimmed Mean 32.3889
Median 29.0000
Variance 150.278
Std. Deviation 12.25878
Minimum 17.00
Maximum 50.00
Range 33.00
Interquartile Range 26.0000
Skewness .516 .687
Kurtosis -1.278 1.334
Case Processing Summary

Cases
Valid Missing Total

IV1 N Percent N Percent N Percent
DV1 1 3 100.0% 0 .0% 3 100.0%
2 4 100.0% 0 .0% 4 100.0%
3 3 100.0% 0 .0% 3 100.0%
Descriptives

IV1 Statistic Std. Error
DV1 1 Mean 49.0000 .57735
95% Confidence Interval for Mean Lower Bound 46.5159
Upper Bound 51.4841
5% Trimmed Mean .
Median 49.0000
Variance 1.000
Std. Deviation 1.00000
Minimum 48.00
Maximum 50.00
Range 2.00
Interquartile Range .
Skewness .000 1.225
Kurtosis . .
2 Mean 22.0000 2.34521
95% Confidence Interval for Mean Lower Bound 14.5365
Upper Bound 29.4635
5% Trimmed Mean 21.9444
Median 21.5000
Variance 22.000
Std. Deviation 4.69042
Minimum 17.00
Maximum 28.00
Range 11.00
Interquartile Range 9.0000
Skewness .543 1.014
Kurtosis -.153 2.619
3 Mean 30.0000 1.15470
95% Confidence Interval for Mean Lower Bound 25.0317
Upper Bound 34.9683
5% Trimmed Mean .
Median 30.0000
Variance 4.000
Std. Deviation 2.00000
Minimum 28.00
Maximum 32.00
Range 4.00
Interquartile Range .
Skewness .000 1.225
Kurtosis . .

Now let's a technique that is more general and that can be used with any type of analysis.  First, we need to sort the data by by our grouping variable, in this case, iv1.  Then we split the file by the same variable.  The split file command temporarily splits the file by the variable specified.  All analyses will be grouped by this variable until the split file off command is issued, or until the data are resorted.  Note that the split file command can be used with numeric, short and long string variables.  (Many SPSS commands will not work with long string variables, but split file will.)  Next, list the commands for the analyses that you would like.  Finally, issue the split file off command.

sort cases by iv1.
split file by iv1.
correlations var = dv1 with dv2. 
Correlations
IV1 DV2
1 DV1 Pearson Correlation .993
Sig. (2-tailed) .073
N 3
2 DV1 Pearson Correlation .780
Sig. (2-tailed) .220
N 4
3 DV1 Pearson Correlation -.766
Sig. (2-tailed) .444
N 3

split file off.

Note that you can use more than one variable to categorize your analysis.  To do so, list all of the variables by which you want the analysis categorized in the sort cases command and in the split file command.

sort cases by iv1 iv2.
split file by iv1 iv2.
correlations var = dv1 with dv2. 
Correlations
IV1 IV2 DV2
1 1.00 DV1 Pearson Correlation .993
Sig. (2-tailed) .073
N 3
2 1.00 DV1 Pearson Correlation 1.000
Sig. (2-tailed) .
N 2
2.00 DV1 Pearson Correlation -1.000
Sig. (2-tailed) .
N 2
3 2.00 DV1 Pearson Correlation -.766
Sig. (2-tailed) .444
N 3
split file off.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.