Stata Learning Module
Reshaping data long to wide

This module illustrates the power (and simplicity) of Stata in its ability to reshape data files. These examples take long data files and reshape them into wide form. These examples cover some common examples, but this is only part of the features and options of the Stata reshape command.

Example #1: Reshaping data long to wide

The reshape command can be used to make data from a long format to a wide format. Consider the kids file (to make things simple at first, we will drop the variables kidname, sex and wt).

use kids, clear 

drop  kidname sex wt 

list 

          famid      birth        age 
  1.         1          1          9  
  2.         1          2          6  
  3.         1          3          3  
  4.         2          1          8  
  5.         2          2          6  
  6.         2          3          2  
  7.         3          1          6  
  8.         3          2          4  
  9.         3          3          2   

Let's make age in this file wide, making one record per family which would contain age1 age2 age3, the ages of the kids in the family (age2 would be missing if there is only one kid, and age3 would be missing if there are only two kids). Let's look at the data before and after reshaping.

list 

          famid      birth        age 
  1.         1          1          9  
  2.         1          2          6  
  3.         1          3          3  
  4.         2          1          8  
  5.         2          2          6  
  6.         2          3          2  
  7.         3          1          6  
  8.         3          2          4  
  9.         3          3          2   

reshape wide age, i(famid)  j(birth)  

(note:  j = 1 2 3)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                        9   ->       3
Number of variables                   3   ->       4
j variable (3 values)             birth   ->   (dropped)
xij variables:
                                    age   ->   age1 age2 age3
----------------------------------------------------------------------------- 

list 

          famid       age1       age2       age3 
  1.         1          9          6          3  
  2.         2          8          6          2  
  3.         3          6          4          2   

Let's look at the pieces of the reshape command.

reshape wide age, j(birth) i(famid)
wide tells reshape that we want to go from long to wide
age tells Stata that the variable to be converted from long to wide is age
i(famid) tells reshape that famid uniquely identifies observations in the wide form
j(birth) tells reshape that the suffix of age (1 2 3) should be taken from the variable birth

Example #2: Reshaping data long to wide with more than one variable

The reshape command can work on more than one variable at a time. In the example above, we just reshaped the age variable. In the example below, we reshape the variables age, wt and sex like this

reshape wide age wt sex,  i(famid) j(birth)

Let's look at the data before and after reshaping.

use kids, clear 

list 

          famid    kidname      birth        age         wt        sex 
  1.         1       Beth          1          9         60          f  
  2.         1        Bob          2          6         40          m  
  3.         1       Barb          3          3         20          f  
  4.         2       Andy          1          8         80          m  
  5.         2         Al          2          6         50          m  
  6.         2        Ann          3          2         20          f  
  7.         3       Pete          1          6         60          m  
  8.         3        Pam          2          4         40          f  
  9.         3       Phil          3          2         20          m   

reshape wide kidname age wt sex, i(famid) j(birth) 

 (note:  j = 1 2 3)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                        9   ->       3
Number of variables                   6   ->      13
j variable (3 values)             birth   ->   (dropped)
xij variables:
                                kidname   ->   kidname1 kidname2 kidname3
                                    age   ->   age1 age2 age3
                                     wt   ->   wt1 wt2 wt3
                                    sex   ->   sex1 sex2 sex3
----------------------------------------------------------------------------- 

list 

 Observation 1

       famid            1    kidname1         Beth        age1            9
         wt1           60        sex1            f    kidname2          Bob
        age2            6         wt2           40        sex2            m
    kidname3         Barb        age3            3         wt3           20
        sex3            f


Observation 2

       famid            2    kidname1         Andy        age1            8
         wt1           80        sex1            m    kidname2           Al
        age2            6         wt2           50        sex2            m
    kidname3          Ann        age3            2         wt3           20
        sex3            f


Observation 3

       famid            3    kidname1         Pete        age1            6
         wt1           60        sex1            m    kidname2          Pam
        age2            4         wt2           40        sex2            f
    kidname3         Phil        age3            2         wt3           20
        sex3            m 

Example #3: Reshaping wide with character suffixes

The examples above showed how to reshape data using numeric suffixes, but reshape can handle character suffixes as well. Consider the dadmoml data file shown below.

use dadmoml, clear 

list 

          famid       name        inc     dadmom 
  1.         2        Art      22000        dad  
  2.         1       Bill      30000        dad  
  3.         3       Paul      25000        dad  
  4.         1       Bess      15000        mom  
  5.         3        Pat      50000        mom  
  6.         2        Amy      18000        mom   

Let's reshape this to be in a wide format, containing one record per family. The reshape command below uses string to tell reshape that the suffix is character.

reshape wide name inc,  i(famid) j(dadmom) string

Let's look at the data before and after reshaping.

list 

          famid       name        inc     dadmom 
  1.         2        Art      22000        dad  
  2.         1       Bill      30000        dad  
  3.         3       Paul      25000        dad  
  4.         1       Bess      15000        mom  
  5.         3        Pat      50000        mom  
  6.         2        Amy      18000        mom   

reshape wide name inc, i(famid) j(dadmom) string 

(note:  j = dad mom)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                        6   ->       3
Number of variables                   4   ->       5
j variable (2 values)            dadmom   ->   (dropped)
xij variables:
                                   name   ->   namedad namemom
                                    inc   ->   incdad incmom
----------------------------------------------------------------------------- 

list 

          famid    namedad     incdad    namemom     incmom 
  1.         1       Bill      30000       Bess      15000  
  2.         2        Art      22000        Amy      18000  
  3.         3       Paul      25000        Pat      50000   

Summary

Reshaping data long to wide

 Long format
        famid      birth         age 
  1.         1          1          9 
  2.         1          2          6 
  3.         1          3          3 
  4.         2          1          8 
  5.         2          2          6 
  6.         2          3          2 
  7.         3          1          6 
  8.         3          2          4 
  9.         3          3          2 
 
reshape wide age, j(birth) i(famid) 
 
 Wide format
         famid       age1       age2       age3 
  1.         1          9          6          3 
  2.         2          8          6          2 
  3.         3          6          4          2 

The general syntax of reshape wide can be expressed as:

reshape wide long-var(s),  i( wide-id-var ) j( var-with-suffix ) 
where
long-var(s)      is the name of the long variable(s) to be made wide e.g. age
wide-id-var      is the variable that uniquely identifies wide 
                 observations, e.g. famid
var-with-suffix  is the variable from the long file that contains 
                 the suffix for the wide variables, e.g. birth

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.