SPSS Learning Modules
Reshaping data wide to long

This module illustrates how to reshape data files in SPSS. These examples take wide data files and reshape them into long form. These show common examples of reshaping data but do not exhaustively demonstrate the different kinds of data reshaping that you could encounter.

Example #1: One variable

Consider the file containing the kids and their heights at one year of age (ht1) and at two years of age (ht2).

get file 'c:\kidshtwt.sav'.
list famid birth ht1 ht2.
    FAMID     BIRTH       HT1       HT2

     1.00      1.00      2.80      3.40
     1.00      2.00      2.90      3.80
     1.00      3.00      2.20      2.90
     2.00      1.00      2.00      3.20
     2.00      2.00      1.80      2.80
     2.00      3.00      1.90      2.40
     3.00      1.00      2.20      3.30
     3.00      2.00      2.30      3.40
     3.00      3.00      2.10      2.90

 Number of cases read:  9    Number of cases listed:  9

This is called a wide format since the heights are wide. We may want the data to be long, where each height is in a separate observation. First, we create a vector of the variable to be reshaped.  Then we create a loop using the index variable and compute a new variable using the vector, looping as many times as needed.  We save these changes to a new data file and end the loop.  Finally, we get the new data file and list it to make sure that all went as planned. Note that we use xsave here instead of save. This is because xsave is not executed until data are read for the next procedure. Hence, it reduces processing time by consolidating two data passes into one.

vector Aht = ht1 to ht2.
loop age = 1 to 2.
compute ht = Aht(age).
xsave outfile 'c:\longex1.sav'
  /drop ht1 ht2 wt1 wt2.
end loop.
execute.
get file 'c:\longex1.sav'.
list.
    FAMID     BIRTH      AGE       HT

     1.00      1.00       1.00     2.80
     1.00      1.00       2.00     3.40
     1.00      2.00       1.00     2.90
     1.00      2.00       2.00     3.80
     1.00      3.00       1.00     2.20
     1.00      3.00       2.00     2.90
     2.00      1.00       1.00     2.00
     2.00      1.00       2.00     3.20
     2.00      2.00       1.00     1.80
     2.00      2.00       2.00     2.80
     2.00      3.00       1.00     1.90
     2.00      3.00       2.00     2.40
     3.00      1.00       1.00     2.20
     3.00      1.00       2.00     3.30
     3.00      2.00       1.00     2.30
     3.00      2.00       2.00     3.40
     3.00      3.00       1.00     2.10
     3.00      3.00       2.00     2.90

Number of cases read:  18    Number of cases listed:  18

Example #2: Two variables

Let's use the same data file, but with all of the variables.  In this example, we show how to reshape two variables at a time.  Note that you can reshape as many variables as you need by adding a vector and a compute command for each variable to be reshaped. You may also want to add those variables to the /drop subcommand in the aggregate command.

get file 'c:\kidshtwt.sav'.
list.
    FAMID     BIRTH       HT1       HT2      WT1      WT2

     1.00      1.00      2.80      3.40       19       28
     1.00      2.00      2.90      3.80       21       28
     1.00      3.00      2.20      2.90       20       23
     2.00      1.00      2.00      3.20       25       30
     2.00      2.00      1.80      2.80       20       33
     2.00      3.00      1.90      2.40       22       33
     3.00      1.00      2.20      3.30       22       28
     3.00      2.00      2.30      3.40       20       30
     3.00      3.00      2.10      2.90       22       31

Number of cases read:  9    Number of cases listed:  9
vector Aht = ht1 to ht2.
vector Awt = wt1 to wt2.
loop age = 1 to 2.
compute ht = Aht(age).
compute wt = Awt(age).
xsave outfile 'c:\longex2.sav'
  /drop ht1 ht2 wt1 wt2.
end loop.
execute.
get file 'c:\longex2.sav'.
list.
    FAMID     BIRTH      AGE       HT       WT

     1.00      1.00     1.00     2.80    19.00
     1.00      1.00     2.00     3.40    28.00
     1.00      2.00     1.00     2.90    21.00
     1.00      2.00     2.00     3.80    28.00
     1.00      3.00     1.00     2.20    20.00
     1.00      3.00     2.00     2.90    23.00
     2.00      1.00     1.00     2.00    25.00
     2.00      1.00     2.00     3.20    30.00
     2.00      2.00     1.00     1.80    20.00
     2.00      2.00     2.00     2.80    33.00
     2.00      3.00     1.00     1.90    22.00
     2.00      3.00     2.00     2.40    33.00
     3.00      1.00     1.00     2.20    22.00
     3.00      1.00     2.00     3.30    28.00
     3.00      2.00     1.00     2.30    20.00
     3.00      2.00     2.00     3.40    30.00
     3.00      3.00     1.00     2.10    22.00
     3.00      3.00     2.00     2.90    31.00

Number of cases read:  18    Number of cases listed:  18

Example #3: Modifying numeric suffixes

This example is like the first example in that we are reshaping only one variable.  However, in this example we don't really want the loop to start from one as we have done before, but rather start at 96 and go to 98.  An extra step is necessary to accomplish this, because in SPSS the loop has to start with one.  Therefore, we add 95 to each value of  year.  

get file 'c:\faminc.sav'.
list.
    FAMID  FAMINC96  FAMINC97  FAMINC98

     3.00  75000.00  76000.00  77000.00
     1.00  40000.00  40500.00  41000.00
     2.00  45000.00  45400.00  45800.00

Number of cases read:  3    Number of cases listed:  3 
vector Ainc=faminc96 to faminc98.
loop i = 1 to 3.
compute income=Ainc(i).
compute year = 95+i.
xsave outfile 'c:\widefaminc'
 /keep famid year income.
end loop.
execute.
get file 'c:\widefaminc'.
list.
    FAMID     YEAR   INCOME

     3.00    96.00 75000.00
     3.00    97.00 76000.00
     3.00    98.00 77000.00
     1.00    96.00 40000.00
     1.00    97.00 40500.00
     1.00    98.00 41000.00
     2.00    96.00 45000.00
     2.00    97.00 45400.00
     2.00    98.00 45800.00

Number of cases read:  9    Number of cases listed:  9

Example #4: String variables and character suffixes

It also is possible to reshape a wide data file to be long when there are character suffixes. Look at the dmorder file below. Note that we want our long data set to contain a new string variable called name. To create and/or modify a numeric variable, you could use the compute command. However, you can only MODIFY a string variable with the compute command. To CREATE a string variable, you need to use the string command. The syntax for this command is straight forward: STRING varname (A_), where the _ is the length of the variable. An example of the use of this command is presented below.

get file 'c:\dmorder.sav'.
list.
    FAMID NAMED NAMEM      INCD      INCM

     1.00 Bill  Bess   30000.00  15000.00
     2.00 Art   Amy    22000.00  18000.00
     3.00 Paul  Pat    25000.00  50000.00

Number of cases read:  3    Number of cases listed:  3
vector Aname = named to namem.
vector Ainc = incd to incm.
string name (A4).
loop dadmom = 1 to 2.
compute name = Aname(dadmom).
compute inc = Ainc(dadmom).
xsave outfile 'c:\dm1.sav'
 /keep famid dadmom name inc.
end loop.
execute.
get file 'c:\dm1.sav'.
list.
    FAMID   DADMOM NAME      INC

     1.00     1.00 Bill 30000.00
     1.00     2.00 Bess 15000.00
     2.00     1.00 Art  22000.00
     2.00     2.00 Amy  18000.00
     3.00     1.00 Paul 25000.00
     3.00     2.00 Pat  50000.00

Number of cases read:  6    Number of cases listed:  6

Example #5: Non-contiguous variables

SPSS assumes that the variables to be reshaped are contiguous (side-by-side) in your data file. If they are not, you will get an error message when SPSS encounters the vector command. To address this problem, use a save command with the /keep subcommand, listing the variables in the correct order. The data will be saved with the variables in the order that they were listed in the /keep subcommand. Notice in the file shown below that the variables named and namem are not listed next to each other, nor are incd and incm. Also note that if there were variables with numeric suffixes, they would have to be contiguous too. Finally, the variables have to be listed on the vector command in the order that they appear in the data file. This applies to both numeric and string variables.

get file 'c:\dadmomw.sav'.
list.
    FAMID NAMED      INCD NAMEM      INCM

     1.00 Bill   30000.00 Bess   15000.00
     2.00 Art    22000.00 Amy    18000.00
     3.00 Paul   25000.00 Pat    50000.00

Number of cases read:  3    Number of cases listed:  3
save outfile = "c:\dmorder.sav" 
 /keep=famid named namem incd incm.
execute.
get file 'c:\dmorder.sav'.
list.
    FAMID NAMED NAMEM      INCD      INCM

     1.00 Bill  Bess   30000.00  15000.00
     2.00 Art   Amy    22000.00  18000.00
     3.00 Paul  Pat    25000.00  50000.00

Number of cases read:  3    Number of cases listed:  3

Now that the variables are in the correct order, we can proceed with the reshaping as before.  

vector Aname = named to namem.
vector Ainc = incd to incm.
string name (A4).
loop dadmom = 1 to 2.
compute name = Aname(dadmom).
compute inc = Ainc(dadmom).
xsave outfile 'c:\dm.sav'
  /keep famid dadmom name inc.
end loop.
execute.
get file 'c:\dm.sav'.
list.
    FAMID   DADMOM NAME      INC

     1.00     1.00 Bill 30000.00
     1.00     2.00 Bess 15000.00
     2.00     1.00 Art  22000.00
     2.00     2.00 Amy  18000.00
     3.00     1.00 Paul 25000.00
     3.00     2.00 Pat  50000.00

Number of cases read:  6    Number of cases listed:  6

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.