|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
This module illustrates how to reshape data files in SPSS. These examples take long data files and reshape them into wide form. These show common examples of reshaping data but do not exhaustively demonstrate the different kinds of data reshaping that you could encounter.
Let's use the kidslw file for our first example.
get file 'c:\kidslw.sav'.
list famid birth age.
FAMID BIRTH AGE
1.00 1.00 9.00
1.00 2.00 6.00
1.00 3.00 3.00
2.00 1.00 8.00
2.00 2.00 6.00
2.00 3.00 2.00
3.00 1.00 6.00
3.00 2.00 4.00
3.00 3.00 2.00
Number of cases read: 9 Number of cases listed: 9
Let's make age in this file wide, making one record per family which would contain age1, age2 and age3, the ages of the kids in the family (age2 would be missing if there is only one kid, and age3 would be missing if there are only two kids). As our first step, we will create a vector with three elements, calling it age. Next, we will create a variable also called age and insert the values from the vector. We then list the data in the file to show how they have changed. In this and all of the following examples, we will list the data at intermediate steps of the process of making the file wide. This is useful in case the process does not go correctly, because it is easier to see where the error occurred.
vector age(3).
compute age(birth) = age.
list id famid age age1 age2 age3.
ID FAMID AGE AGE1 AGE2 AGE3
1.00 1.00 9.00 9.00 . .
2.00 1.00 6.00 . 6.00 .
3.00 1.00 3.00 . . 3.00
4.00 2.00 8.00 8.00 . .
5.00 2.00 6.00 . 6.00 .
6.00 2.00 2.00 . . 2.00
7.00 3.00 6.00 6.00 . .
8.00 3.00 4.00 . 4.00 .
9.00 3.00 2.00 . . 2.00
Number of cases read: 9 Number of cases listed: 9
The next step is to aggregate the data. We use the max function to capture the largest value in the variables age1 age2 and age3. Note that while we use the max function, you could also use the min function. Both functions work because SPSS does not consider a missing value to be either infinitely large nor infinitely small (as SAS and Stata do). Note that the variables not directly referenced in the aggregate command are dropped (i.e., all variables aside from famid age1, age2 and age3 are dropped).
aggregate outfile 'c:\kids1.sav'
/break famid
/age1 to age3 = max(age1 to age3).
get file 'c:\kids1.sav'.
list.
FAMID AGE1 AGE2 AGE3
1.00 9.00 6.00 3.00
2.00 8.00 6.00 2.00
3.00 6.00 4.00 2.00
Number of cases read: 3 Number of cases listed: 3
In the example above, we reshaped only one variable, age. In the example below, we reshape the variables age and wt. Note that for each variable to be reshaped, you need to add a vector and a compute command. You also need to get the max (or min) of the variable in the aggregate command. Hence, you can reshape as many variables as needed.
get file 'c:\kidslw.sav'.
list famid birth age.
ID FAMID BIRTH AGE
1.00 1.00 1.00 9.00
2.00 1.00 2.00 6.00
3.00 1.00 3.00 3.00
4.00 2.00 1.00 8.00
5.00 2.00 2.00 6.00
6.00 2.00 3.00 2.00
7.00 3.00 1.00 6.00
8.00 3.00 2.00 4.00
9.00 3.00 3.00 2.00
Number of cases read: 9 Number of cases listed: 9
vector age(3).
vector wt(3).
compute age(birth) = age.
compute wt(birth) = wt.
list.
The variables are listed in the following order:
LINE 1: ID FAMID KIDNAME BIRTH AGE WT SEX AGE1
LINE 2: AGE2 AGE3 WT1 WT2 WT3
ID: 1.00 1.00 Beth 1.00 9.00 60.00 f 9.00
AGE2: . . 60.00 . .
ID: 2.00 1.00 Bob 2.00 6.00 40.00 m .
AGE2: 6.00 . . 40.00 .
ID: 3.00 1.00 Barb 3.00 3.00 20.00 f .
AGE2: . 3.00 . . 20.00
ID: 4.00 2.00 Andy 1.00 8.00 80.00 m 8.00
AGE2: . . 80.00 . .
ID: 5.00 2.00 Al 2.00 6.00 50.00 m .
AGE2: 6.00 . . 50.00 .
ID: 6.00 2.00 Ann 3.00 2.00 20.00 f .
AGE2: . 2.00 . . 20.00
ID: 7.00 3.00 Pete 1.00 6.00 60.00 m 6.00
AGE2: . . 60.00 . .
ID: 8.00 3.00 Pam 2.00 4.00 40.00 f .
AGE2: 4.00 . . 40.00 .
ID: 9.00 3.00 Phil 3.00 2.00 20.00 m .
AGE2: . 2.00 . . 20.00
Number of cases read: 9 Number of cases listed: 9
aggregate outfile 'c:\kids2.sav'
/break famid
/age1 to age3 = max(age1 to age3)
/wt1 to wt3 = max(wt1 to wt3).
get file 'c:\kids2.sav'.
list.
FAMID AGE1 AGE2 AGE3 WT1 WT2 WT3
1.00 9.00 6.00 3.00 60.00 40.00 20.00
2.00 8.00 6.00 2.00 80.00 50.00 20.00
3.00 6.00 4.00 2.00 60.00 40.00 20.00
Number of cases read: 3 Number of cases listed: 3
The examples above showed how to reshape data using numeric variables, but sometimes you will need to reshape string variables as well. In this example, we reshape two numeric variables, age and wt, and two string variables, sex and kidname. We need to create the two string variables before creating the vectors, and we use the string command to do this. Note that while you can use the compute command to create a numeric variable, you can only use the compute command to modify a string variable.
Consider the kidslw data file shown below.
get file 'c:\kidslw.sav'.
list.
ID FAMID KIDNAME BIRTH AGE WT SEX
1.00 1.00 Beth 1.00 9.00 60.00 f
2.00 1.00 Bob 2.00 6.00 40.00 m
3.00 1.00 Barb 3.00 3.00 20.00 f
4.00 2.00 Andy 1.00 8.00 80.00 m
5.00 2.00 Al 2.00 6.00 50.00 m
6.00 2.00 Ann 3.00 2.00 20.00 f
7.00 3.00 Pete 1.00 6.00 60.00 m
8.00 3.00 Pam 2.00 4.00 40.00 f
9.00 3.00 Phil 3.00 2.00 20.00 m
Number of cases read: 9 Number of cases listed: 9
vector age(3).
vector wt(3).
string sex1 to sex3 (A4).
vector sex= sex1 to sex3.
string kidname1 to kidname3 (A4).
vector kidname= kidname1 to kidname3.
compute kidname(birth) = kidname.
compute wt(birth) = wt.
compute sex(birth) = sex.
compute age(birth) = age.
list.
The variables are listed in the following order:
LINE 1: ID FAMID KIDNAME BIRTH AGE WT SEX AGE1
LINE 2: AGE2 AGE3 WT1 WT2 WT3 SEX1 SEX2 SEX3 KIDNAME1 KIDNAME2
LINE 3: KIDNAME3
ID: 1.00 1.00 Beth 1.00 9.00 60.00 f 9.00
AGE2: . . 60.00 . . f Beth
KIDNAME3:
ID: 2.00 1.00 Bob 2.00 6.00 40.00 m .
AGE2: 6.00 . . 40.00 . m Bob
KIDNAME3:
ID: 3.00 1.00 Barb 3.00 3.00 20.00 f .
AGE2: . 3.00 . . 20.00 f
KIDNAME3: Barb
ID: 4.00 2.00 Andy 1.00 8.00 80.00 m 8.00
AGE2: . . 80.00 . . m Andy
KIDNAME3:
ID: 5.00 2.00 Al 2.00 6.00 50.00 m .
AGE2: 6.00 . . 50.00 . m Al
KIDNAME3:
ID: 6.00 2.00 Ann 3.00 2.00 20.00 f .
AGE2: . 2.00 . . 20.00 f
KIDNAME3: Ann
ID: 7.00 3.00 Pete 1.00 6.00 60.00 m 6.00
AGE2: . . 60.00 . . m Pete
KIDNAME3:
ID: 8.00 3.00 Pam 2.00 4.00 40.00 f .
AGE2: 4.00 . . 40.00 . f Pam
KIDNAME3:
ID: 9.00 3.00 Phil 3.00 2.00 20.00 m .
AGE2: . 2.00 . . 20.00 m
KIDNAME3: Phil
Number of cases read: 9 Number of cases listed: 9
aggregate outfile 'c:\kids3.sav'
/break famid
/age1 to age3 = max(age1 to age3)
/wt1 to wt3 = max(wt1 to wt3)
/sex1 to sex3 = max(sex1 to sex3)
/kidname1 to kidname3 = max(kidname1 to kidname3).
get file 'c:\kids3.sav'.
list.
The variables are listed in the following order:
LINE 1: FAMID AGE1 AGE2 AGE3 WT1 WT2 WT3 SEX1
LINE 2: SEX2 SEX3 KIDNAME1 KIDNAME2 KIDNAME3
FAMID: 1.00 9.00 6.00 3.00 60.00 40.00 20.00 f
SEX2: m f Beth Bob Barb
FAMID: 2.00 8.00 6.00 2.00 80.00 50.00 20.00 m
SEX2: m f Andy Al Ann
FAMID: 3.00 6.00 4.00 2.00 60.00 40.00 20.00 m
SEX2: f m Pete Pam Phil
Number of cases read: 3 Number of cases listed: 3
You cannot use character suffixes in SPSS when reshaping data because you cannot aggregate the data to form the wide data file. Hence, in this example, we use a numeric suffix while reshaping and rename the variables with the character suffix as the last step.
get file 'c:\dadmoml.sav'. list.ID FAMID NAME INC DADMOM 1.00 2.00 Art 22000.00 dad 2.00 1.00 Bill 30000.00 dad 3.00 3.00 Paul 25000.00 dad 4.00 1.00 Bess 15000.00 mom 5.00 3.00 Pat 50000.00 mom 6.00 2.00 Amy 18000.00 mom Number of cases read: 6 Number of cases listed: 6if dadmom = "mom" dadmomN = 1. if dadmom = "dad" dadmomN = 2. list.ID FAMID NAME INC DADMOM DADMOMN 1.00 2.00 Art 22000.00 dad 2.00 2.00 1.00 Bill 30000.00 dad 2.00 3.00 3.00 Paul 25000.00 dad 2.00 4.00 1.00 Bess 15000.00 mom 1.00 5.00 3.00 Pat 50000.00 mom 1.00 6.00 2.00 Amy 18000.00 mom 1.00 Number of cases read: 6 Number of cases listed: 6string name1 to name2 (A4). vector name = name1 to name2. compute name(dadmomN) = name. vector inc(2). compute inc(dadmomN) = inc. list.ID FAMID NAME INC DADMOM DADMOMN NAME1 NAME2 INC1 INC2 1.00 2.00 Art 22000.00 dad 2.00 Art . 22000.00 2.00 1.00 Bill 30000.00 dad 2.00 Bill . 30000.00 3.00 3.00 Paul 25000.00 dad 2.00 Paul . 25000.00 4.00 1.00 Bess 15000.00 mom 1.00 Bess 15000.00 . 5.00 3.00 Pat 50000.00 mom 1.00 Pat 50000.00 . 6.00 2.00 Amy 18000.00 mom 1.00 Amy 18000.00 . Number of cases read: 6 Number of cases listed: 6aggregate outfile 'c:\kids4.sav' /break famid /inc1 to inc2 = max(inc1 to inc2) /name1 to name2 = max(name1 to name2). get file 'c:\kids4.sav'. list.FAMID INC1 INC2 NAME1 NAME2 1.00 15000.00 30000.00 Bess Bill 2.00 18000.00 22000.00 Amy Art 3.00 50000.00 25000.00 Pat Paulrename variables (inc1 to inc2 = incmom incdad). rename variables (name1 to name2 = namemom namedad). list. Number of cases read: 3 Number of cases listed: 3FAMID INCMOM INCDAD NAMEMOM NAMEDAD 1.00 15000.00 30000.00 Bess Bill 2.00 18000.00 22000.00 Amy Art 3.00 50000.00 25000.00 Pat Paul Number of cases read: 3 Number of cases listed: 3
There may be times when your data are ordered but an index variable is not present. In this situation, you will need to create an index variable before you can reshape the data. This example illustrates how to create an index variable and then use that index variable in the reshaping. We will use the data from Example 1. In that example, birth is the index variable. Hence, we will create an index variable which will have the same values as the variable birth. We will call our index variable index.
get file 'c:\kidslw.sav'. list famid birth age.FAMID BIRTH AGE 1.00 1.00 9.00 1.00 2.00 6.00 1.00 3.00 3.00 2.00 1.00 8.00 2.00 2.00 6.00 2.00 3.00 2.00 3.00 1.00 6.00 3.00 2.00 4.00 3.00 3.00 2.00 Number of cases read: 9 Number of cases listed: 9compute index = 1. if famid = lag(famid) index = lag(index) + 1. vector age(3). compute age(index) = age. list id famid age age1 age2 age3 birth index.ID FAMID AGE AGE1 AGE2 AGE3 BIRTH INDEX 1.00 1.00 9.00 9.00 . . 1.00 1.00 2.00 1.00 6.00 . 6.00 . 2.00 2.00 3.00 1.00 3.00 . . 3.00 3.00 3.00 4.00 2.00 8.00 8.00 . . 1.00 1.00 5.00 2.00 6.00 . 6.00 . 2.00 2.00 6.00 2.00 2.00 . . 2.00 3.00 3.00 7.00 3.00 6.00 6.00 . . 1.00 1.00 8.00 3.00 4.00 . 4.00 . 2.00 2.00 9.00 3.00 2.00 . . 2.00 3.00 3.00 Number of cases read: 9 Number of cases listed: 9aggregate outfile 'c:\kids5.sav' /break famid /age1 to age3 = max(age1 to age3). get file 'c:\kids5.sav'. list.FAMID AGE1 AGE2 AGE3 1.00 9.00 6.00 3.00 2.00 8.00 6.00 2.00 3.00 6.00 4.00 2.00 Number of cases read: 3 Number of cases listed: 3
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services