UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Learning Module
Reshaping data long to wide in SAS

Basic reshaping data long to wide

This link will take you to a SAS macro called %towide for reshaping data from long format to wide format.  This link will take you to the help file.

1. Simple reshape long to wide

Sometimes you need to reshape your data which is in a long format (like shown below)

long 

famid     year    faminc 
  1       96      40000 
  1       97      40500 
  1       98      41000 
  2       96      45000 
  2       97      45400 
  2       98      45800 
  3       96      75000 
  3       97      76000 
  3       98      77000 

into a wide format (like shown below).

wide 
famid faminc96 faminc97 faminc98 
1     40000    40500    41000 
2     45000    45400    45800 
3     75000    76000    77000 

The long record has one record per year (per family) while the wide record has one record per family. Doing this kind of reshaping in SAS can be tricky. We have written a macro (a program) to help you do this kind of standard data reshaping. In order to use this program, you need to be able to provide some key information about your data to be reshaped. Let's use the examples of the long and wide files above to illustrate.

You need to be able to supply seven pieces of information:

1. What is the name of the long data file? LONG
2. What is the name we want for the wide data file? We want to call the long data file WIDE
3. What is the name of the variable which uniquely identifies the wide observations? The wide observations are uniquely identified by the variable FAMID
4. What is the name of the variable in the long data file that will be used for the suffix of the wide variables (the 96 97 and 98 at the end of faminc)? The 96 97 and 98 come from the variable YEAR.
5. What is the lowest value of year? The lowest value is 96
6. What is the highest value of year? The highest value is 98
7. What is the name of the variable to be converted from long to wide? The variable to be converted from long to wide is FAMINC

Here is an example showing how you would do this in SAS using the %towide macro (assuming you have downloaded the macro).

* 1. This shows an example of converting a simple ;
* data file from long to wide ; 
data long1; 
  input famid year faminc ; 
cards; 
1 96 40000 
1 97 40500 
1 98 41000 
2 96 45000 
2 97 45400 
2 98 45800 
3 96 75000 
3 97 76000 
3 98 77000 
; 
run; 
%towide(long1,wide1,famid,year,96,98,faminc); 

Notice the values in the parentheses are the answers to the 7 questions
1. What is the name of the long data file? long1
2. What is the name we want for the wide data file? wide1
3. What is the name of the variable that uniquely identifies the wide observations? famid
4. What variable contains the suffix of the wide variables (96 97 98)? year
5. What is the lowest value of year? 96
6. What is the highest value of year? 98
7. What is the name of the variable to be converted from wide to long? faminc

[portion of output below]

OBS    FAMID    FAMINC96    FAMINC97    FAMINC98 
 1       1        40000       40500       41000 
 2       2        45000       45400       45800 
 3       3        75000       76000       77000 

This is actually a subset of the output from the SAS program towide1.sas, but you can see that the data were properly reshaped. You can look at the observations from long1 in the program and see that they have been properly reshaped into a wide data file in wide1.

Let's have a look at the complete output of towide1.sas. There are two parts to the output:
1. A proc print of a sample (up to 10 wide observations). First, observations from the long data file are printed (corresponding to the first 10 wide observations), then a proc print of up to 10 observations from the wide data file.
2. A proc means of the long records and a corresponding proc means of the wide records. It is important to inspect the proc print and proc means to check to see if the data were properly reshaped.

[complete output of towide1.sas]

1A. Verify reshaping of long to wide using PROC PRINT of 10 wide records 
Compare this print of the long data below (long1) with the 
PROC PRINT of the wide data (wide1) in step 1B below.

PROC PRINT of LONG data file long1

OBS    FAMID    YEAR    FAMINC 
1       1       96      40000 
2       1       97      40500 
3       1       98      41000 
4       2       96      45000 
5       2       97      45400 
6       2       98      45800 
7       3       96      75000 
8       3       97      76000 
9       3       98      77000

1B. Verify reshaping of long to wide using PROC PRINT of 10 wide records 
Compare this print of the wide data (wide1) below with the 
PROC PRINT of the long data (long1) in Step 1A above

PROC PRINT of WIDE data file wide1

OBS    FAMID    FAMINC96    FAMINC97    FAMINC98 
1       1        40000       40500       41000 
2       2        45000       45400       45800 
3       3        75000       76000       77000

You can compare the proc print from 1A (the long data) with the proc print of 1B (the wide data) to look for any problems or errors in reshaping the data. For this small dataset, you can verify every record. In more realistic situations, this will be just a small fraction of the records.

2A. Verify reshaping of long to wide using PROC MEANS 
Compare this PROC MEANS for the long data (long1) with the 
PROC MEANS of the wide data (wide1) in step 2B below

PROC MEANS of LONG data file long1

Analysis Variable : FAMINC

YEAR  N Obs  N     Mean   Std Dev   Minimum   Maximum 
------------------------------------------------------ 
  96      3  3  53333.33  18929.69  40000.00  75000.00 
  97      3  3  53966.67  19238.07  40500.00  76000.00 
  98      3  3  54600.00  19546.87  41000.00  77000.00 
------------------------------------------------------

2B. Verify reshaping of long to wide using PROC MEANS 
Compare this PROC MEANS for the wide data (wide1) below 
with the PROC MEANS of the long data (long1) in step 2A above

PROC MEANS of WIDE data file wide1

Variable  N        Mean     Std Dev     Minimum     Maximum 
----------------------------------------------------------- 
FAMINC96  3    53333.33    18929.69    40000.00    75000.00 
FAMINC97  3    53966.67    19238.07    40500.00    76000.00 
FAMINC98  3    54600.00    19546.87    41000.00    77000.00 
------------------------------------------------------------

The output from the proc means can be used to more fully compare the entire long file with the wide file. You can see that the mean, sd, min and max of faminc for the years 96, 97 and 98 (from 2A) are the same as the means of FAMINC96 FAMINC97 and FAMINC98 (from 2B).

2. Reshape two variables

Reshaping two variables is not much harder than reshaping one variable. The only difference is that you specify the two variables to be reshaped where you had just specified the one variable.

* 2. This example has 2 variables going from ;
* long to wide, faminc and spend ; 
data long2; 
  input famid year faminc spend ; 

cards; 
1 96 40000 38000 
1 97 40500 39000 
1 98 41000 40000 
2 96 45000 42000 
2 97 45400 43000 
2 98 45800 44000 
3 96 75000 70000 
3 97 76000 71000 
3 98 77000 72000 
; 
run; 

* notice that where we had just faminc is now faminc spend ; 
%towide(long2,wide2,famid,year,96,98,faminc spend);

[output below]

1A. Verify reshaping of long to wide using PROC PRINT of 10 wide records.
Compare this print of the long data below (long2) with the
PROC PRINT of the wide data (wide2) in step 1B below.

PROC PRINT of LONG data file long2

OBS    FAMID    YEAR    FAMINC    SPEND
 1       1       96      40000    38000
 2       1       97      40500    39000
 3       1       98      41000    40000
 4       2       96      45000    42000
 5       2       97      45400    43000
 6       2       98      45800    44000
 7       3       96      75000    70000
 8       3       97      76000    71000
 9       3       98      77000    72000
 
1B. Verify reshaping of long to wide using PROC PRINT of 10 wide recordsCompare this print of the wide data (wide2) below with the
PROC PRINT of the long data (long2) in Step 1A above

PROC PRINT of WIDE data file wide2

OBS FAMID FAMINC96 FAMINC97 FAMINC98 SPEND96 SPEND97 SPEND98
 1    1     40000    40500    41000   38000   39000   40000
 2    2     45000    45400    45800   42000   43000   44000
 3    3     75000    76000    77000   70000   71000   72000

2A. Verify reshaping of long to wide using PROC MEANS
    Compare this PROC MEANS for the long data (long2) with the
    PROC MEANSof the wide data (wide2) in step 2B below.

PROC MEANS of LONG data file long2

YEAR  N Obs  Variable  N        Mean     Std Dev     Minimum
------------------------------------------------------------
  96      3  FAMINC    3    53333.33    18929.69    40000.00
             SPEND     3    50000.00    17435.60    38000.00
  97      3  FAMINC    3    53966.67    19238.07    40500.00
             SPEND     3    51000.00    17435.60    39000.00
  98      3  FAMINC    3    54600.00    19546.87    41000.00
             SPEND     3    52000.00    17435.60    40000.00
------------------------------------------------------------
        YEAR  N Obs  Variable       Maximum
-------------------------------------------
          96      3  FAMINC        75000.00
                     SPEND         70000.00
          97      3  FAMINC        76000.00
                     SPEND         71000.00
          98      3  FAMINC        77000.00
                     SPEND         72000.00
-------------------------------------------

2B. Verify reshaping of long to wide using PROC MEANS
    Compare this PROC MEANS for the wide data (wide2) below
    with the PROC MEANS of the long data (long2) in step 2A above

PROC MEANS of WIDE data file wide2

Variable  N     Mean     Std Dev     Minimum     Maximum
----------------------------------------------------------
FAMINC96  3  53333.33    18929.69    40000.00    75000.00
SPEND96   3  50000.00    17435.60    38000.00    70000.00
FAMINC97  3  53966.67    19238.07    40500.00    76000.00
SPEND97   3  51000.00    17435.60    39000.00    71000.00
FAMINC98  3  54600.00    19546.87    41000.00    77000.00
SPEND98   3  52000.00    17435.60    40000.00    72000.00
---------------------------------------------------------

The proc print and proc means results indicate that the reshape was successful. The log also includes the information shown below to help you confirm that you provided the correct information about how the data should be reshaped. This summary looks correct.

- Reshape Long to Wide: Actions to taken summarized below
------------------------------------------------------------------ 
- Input  (long) data file is                        : long2 
- Output (wide) data file is                        : wide2 
- Variable that uniquely identifies wide records is : famid 
- Variable name with suffix for wide variable is    : year 
-                                and can range from : 96 to 98 
- Long variables to wide variables...
-  faminc  -> faminc96 ... faminc98 -> of type N and length 8 
-  spend  -> spend96 ... spend98 -> of type N and length 8

3. Reshape with two variables that identify the wide records

The previous examples had one variable which uniquely identified the wide records (famid). Consider the data file shown below. The data shows the heights of children in a family at age 1 and age 2. The wide version of these records would NOT be uniquely identified by famid. You need both famid and birth to uniquely identify the wide records.

     FAMID    BIRTH    AGE   HT 

      1        1       1     2.8 
      1        1       2     3.4 
      1        2       1     2.9 
      1        2       2     3.8 
      1        3       1     2.2 
      1        3       2     2.9 
      2        1       1     2.0 
      2        1       2     3.2 
      2        2       1     1.8 
      2        2       2     2.8 
      2        3       1     1.9 
      2        3       2     2.4 
      3        1       1     2.2 
      3        1       2     3.3 
      3        2       1     2.3 
      3        2       2     3.4 
      3        3       1     2.1 
      3        3       2     2.9 

The SAS program below shows that it is quite easy to reshape this kind of data file as well.

* 3. This file needs two variables to uniquely identify ;
*    the wide records famid and birth ;  

data long3; 
  input famid birth age ht ; 

cards; 
1 1 1 2.8 
1 1 2 3.4 
1 2 1 2.9 
1 2 2 3.8 
1 3 1 2.2 
1 3 2 2.9 
2 1 1 2.0 
2 1 2 3.2 
2 2 1 1.8 
2 2 2 2.8 
2 3 1 1.9 
2 3 2 2.4 
3 1 1 2.2 
3 1 2 3.3 
3 2 1 2.3 
3 2 2 3.4 
3 3 1 2.1 
3 3 2 2.9 
; 
run; 

* note that famid birth are the variables that ; 
* uniquely identify the wide records ; 

%towide(long3,wide3,famid birth,age,1,2,ht); 

We will forego showing the output, but you can run the program yourself to verify that the proc print and proc means indicate that the data was reshaped properly.

4. A more realistic example

For simplicity, the prior examples were artificially small and simple. Let's look at an example that is a bit more realistic. This example has 50 wide records and six time points per wide record for a total of 300 long records that we would like to make wide. (This example is still quite small, but it at least allows us to see how useful the proc means can be for verifying a file which cannot be verified by just inspecting the proc print.)

* 4. this is a more realistic data file having ;
* 300 long records (50 wide records and 6 time points); 
data long4; 
  input id year inc ; 
cards; 
 1 90 66483 
 1 91 69146 
 1 92 74643 
 1 93 79783 
 1 94 81710 
 1 95 86143 
 2 90 17510 
 2 91 17947 
 2 92 19484 
 2 93 20979 
 2 94 21268 
 2 95 22998 
 3 90 57947 
 3 91 62964 
 3 92 68717 
 3 93 70957 
 3 94 75198 
 3 95 75722 
 4 90 64831 
 4 91 71060 
 4 92 71918 
 4 93 72514 
 4 94 73100 
 4 95 74379 
 5 90 18904 
 5 91 19949 
 5 92 21335 
 5 93 22237 
 5 94 23829 
 5 95 23913 
 6 90 32057 
 6 91 34770 
 6 92 35834 
 6 93 37387 
 6 94 40899 
 6 95 42372 
 7 90 60551 
 7 91 64869 
 7 92 67983 
 7 93 70498 
 7 94 71253 
 7 95 75177 
 8 90 16553 
 8 91 18189 
 8 92 18349 
 8 93 19815 
 8 94 21739 
 8 95 22980 
 9 90 32611 
 9 91 33465 
 9 92 35961 
 9 93 36416 
 9 94 37183 
 9 95 40627 
10 90 61379 
10 91 66002 
10 92 67936 
10 93 70513 
10 94 74405 
10 95 76009 
11 90 24065 
11 91 24229 
11 92 25709 
11 93 26121 
11 94 26617 
11 95 28142 
12 90 32975 
12 91 36185 
12 92 37601 
12 93 41336 
12 94 43399 
12 95 43670 
13 90 69548 
13 91 71341 
13 92 72455 
13 93 76552 
13 94 80538 
13 95 85330 
14 90 50274 
14 91 53349 
14 92 55900 
14 93 59375 
14 94 61216 
14 95 63911 
15 90 72011 
15 91 73334 
15 92 76248 
15 93 77724 
15 94 78638 
15 95 80582 
16 90 18911 
16 91 20046 
16 92 21343 
16 93 21630 
16 94 22330 
16 95 23081 
17 90 68841 
17 91 75410 
17 92 80806 
17 93 81327 
17 94 81571 
17 95 86499 
18 90 28099 
18 91 30716 
18 92 32986 
18 93 36097 
18 94 39124 
18 95 39866 
19 90 17302 
19 91 18778 
19 92 18872 
19 93 19884 
19 94 20665 
19 95 21855 
20 90 16291 
20 91 16674 
20 92 16770 
20 93 17182 
20 94 17979 
20 95 18917 
21 90 43244 
21 91 46545 
21 92 47633 
21 93 50744 
21 94 54734 
21 95 59075 
22 90 56393 
22 91 59120 
22 92 60801 
22 93 61404 
22 94 63111 
22 95 69278 
23 90 47347 
23 91 49571 
23 92 50101 
23 93 51345 
23 94 56463 
23 95 56927 
24 90 16076 
24 91 17217 
24 92 17296 
24 93 17900 
24 94 18171 
24 95 18366 
25 90 65906 
25 91 69679 
25 92 76131 
25 93 77676 
25 94 81980 
25 95 85426 
26 90 58586 
26 91 61188 
26 92 66542 
26 93 69267 
26 94 71063 
26 95 74549 
27 90 61674 
27 91 66584 
27 92 69185 
27 93 75193 
27 94 78647 
27 95 81898 
28 90 31673 
28 91 31883 
28 92 32774 
28 93 34485 
28 94 36929 
28 95 39751 
29 90 63412 
29 91 67593 
29 92 69911 
29 93 73092 
29 94 80105 
29 95 81840 
30 90 27684 
30 91 28439 
30 92 30861 
30 93 31406 
30 94 32960 
30 95 35530 
31 90 71873 
31 91 76449 
31 92 80848 
31 93 88691 
31 94 94149 
31 95 97431 
32 90 62177 
32 91 63812 
32 92 64235 
32 93 65703 
32 94 69985 
32 95 71136 
33 90 37684 
33 91 38258 
33 92 39208 
33 93 39489 
33 94 39745 
33 95 41236 
34 90 64013 
34 91 66398 
34 92 71877 
34 93 75610 
34 94 76395 
34 95 79644 
35 90 16011 
35 91 16847 
35 92 17746 
35 93 19123 
35 94 19183 
35 95 19996 
36 90 49215 
36 91 52195 
36 92 52343 
36 93 56365 
36 94 58752 
36 95 59354 
37 90 15774 
37 91 16643 
37 92 17605 
37 93 18781 
37 94 18996 
37 95 19685 
38 90 29106 
38 91 31693 
38 92 31852 
38 93 34505 
38 94 35806 
38 95 36179 
39 90 25147 
39 91 26923 
39 92 28785 
39 93 30987 
39 94 34036 
39 95 34106 
40 90 71978 
40 91 79144 
40 92 80453 
40 93 86580 
40 94 95164 
40 95 96155 
41 90 46166 
41 91 47579 
41 92 49455 
41 93 53849 
41 94 56630 
41 95 57473 
42 90 55810 
42 91 59443 
42 92 65291 
42 93 66065 
42 94 69009 
42 95 74365 
43 90 49642 
43 91 50603 
43 92 53917 
43 93 54858 
43 94 58470 
43 95 59767 
44 90 21348 
44 91 22361 
44 92 23412 
44 93 24038 
44 94 24774 
44 95 25828 
45 90 44361 
45 91 48720 
45 92 51356 
45 93 54927 
45 94 56670 
45 95 58800 
46 90 56509 
46 91 60517 
46 92 61532 
46 93 65077 
46 94 69594 
46 95 73089 
47 90 39097 
47 91 40293 
47 92 43237 
47 93 44809 
47 94 48782 
47 95 53091 
48 90 18685 
48 91 19405 
48 92 20165 
48 93 20316 
48 94 22197 
48 95 23557 
49 90 73103 
49 91 76243 
49 92 76778 
49 93 82734 
49 94 86279 
49 95 86784 
50 90 48129 
50 91 49267 
50 92 53799 
50 93 58768 
50 94 63011 
50 95 66461 
; 
run; 

%towide(long4, wide4, id, year, 90, 95, inc); 

The output of this program is shown below.

The proc print (parts 1A and 1B below) suggests that the reshape was successful, but it is based only on the first 10 wide records. It is quite possible that there could be errors in other parts of the file. The proc means (parts 2A and 2B) become more important to examine in this case.

1A. Verify reshaping of long to wide using PROC PRINT of 10 wide records.
    Compare this print of the long data below (long4) with the
    PROC PRINT of the wide data (wide4) in step 1B below.

PROC PRINT of LONG data file long4
    
OBS    ID    YEAR     INC
  1     1     90     66483
  2     1     91     69146
  3     1     92     74643
  4     1     93     79783
  5     1     94     81710
  6     1     95     86143
  7     2     90     17510
  8     2     91     17947
  9     2     92     19484
 10     2     93     20979
 11     2     94     21268
 12     2     95     22998
 13     3     90     57947
 14     3     91     62964
 15     3     92     68717
 16     3     93     70957
 17     3     94     75198
 18     3     95     75722
 19     4     90     64831
 20     4     91     71060
 21     4     92     71918
 22     4     93     72514
 23     4     94     73100
 24     4     95     74379
 25     5     90     18904
 26     5     91     19949
 27     5     92     21335
 28     5     93     22237
 29     5     94     23829
 30     5     95     23913
 31     6     90     32057
 32     6     91     34770
 33     6     92     35834
 34     6     93     37387
 35     6     94     40899
 36     6     95     42372
 37     7     90     60551
 38     7     91     64869
 39     7     92     67983
 40     7     93     70498
 41     7     94     71253
 42     7     95     75177
 43     8     90     16553
 44     8     91     18189
 45     8     92     18349
 46     8     93     19815
 47     8     94     21739
 48     8     95     22980
 49     9     90     32611
 50     9     91     33465
 51     9     92     35961
 52     9     93     36416
 53     9     94     37183
 54     9     95     40627
 55    10     90     61379
 56    10     91     66002
 57    10     92     67936
 58    10     93     70513
 59    10     94     74405
 60    10     95     76009
 
 1B. Verify reshaping of long to wide using PROC PRINT of 10 wide records
    Compare this print of the wide data (wide4) below with the
    PROC PRINT of the long data (long4) in Step 1A above.
    
 PROC PRINT of WIDE data file wide4
 
OBS    ID    INC90    INC91    INC92    INC93    INC94    INC95

  1     1    66483    69146    74643    79783    81710    86143
  2     2    17510    17947    19484    20979    21268    22998
  3     3    57947    62964    68717    70957    75198    75722
  4     4    64831    71060    71918    72514    73100    74379
  5     5    18904    19949    21335    22237    23829    23913
  6     6    32057    34770    35834    37387    40899    42372
  7     7    60551    64869    67983    70498    71253    75177
  8     8    16553    18189    18349    19815    21739    22980
  9     9    32611    33465    35961    36416    37183    40627
 10    10    61379    66002    67936    70513    74405    76009

The proc means shows no discrepancies between the long and wide file (parts 2A and 2B below). This gives us more confidence that the reshape was successful.

2A. Verify reshaping of long to wide using PROC MEANS
    Compare this PROC MEANS for the long data (long4) with the
    PROC MEANS of the wide data (wide4) in step 2B below
    PROC MEANS of LONG data file long4

Analysis Variable : INC 

YEAR  N Obs  N       Mean   Std Dev   Minimum   Maximum 
---------------------------------------------------------
 90     50  50   43899.32  19523.39   15774.00  73103.00
 91     50  50   46380.70  20749.43   16643.00  79144.00
 92     50  50   48519.58  21720.12   16770.00  80848.00
 93     50  50   50842.28  22780.12   17182.00  88691.00
 94     50  50   53289.02  23824.01   17979.00  95164.00
 95     50  50   55379.00  24592.83   18366.00  97431.00
--------------------------------------------------------

2B. Verify reshaping of long to wide using PROC MEANS
    Compare this PROC MEANS for the wide data (wide4) below
    with the PROC MEANS of the long data (long4) in step 2A above
    PROC MEANS of WIDE data file wide4

Variable   N        Mean     Std Dev     Minimum     Maximum
------------------------------------------------------------
INC90     50    43899.32    19523.39    15774.00    73103.00
INC91     50    46380.70    20749.43    16643.00    79144.00
INC92     50    48519.58    21720.12    16770.00    80848.00
INC93     50    50842.28    22780.12    17182.00    88691.00
INC94     50    53289.02    23824.01    17979.00    95164.00
INC95     50    55379.00    24592.83    18366.00    97431.00
------------------------------------------------------------

Advanced issues in reshaping data long to wide

5. Changing the number of observations printed

If you have a large number of wide variables, the proc print of the long data can be very long even for just 10 variables. The %towide macro allows you to increase or decrease the number of wide observations that are printed (which alters the corresponding number of long observations as well). If you wanted to just print five wide observations (and then the corresponding number of long observations), you can use the numprint option as shown below.

< data step creating long4 from above would go here.>
< it is omitted to save space > 

%towide(long4, wide4a, id, year, 90, 95, inc, numprint=5); 

6. Suppressing proc print and proc means

If you want to suppress printing of the proc print and the proc means entirely, you can do so using the quiet=yes option as shown below. We don't generally recommend this, we give you the option trusting that you would only use this option when it is safe to do so.

< data step creating long4 from above would go here.>
< it is omitted to save space > 

%towide(long4, wide4b, id, year, 90, 95, inc, quiet=yes); 

7. Suppressing sorting of long data

In order to make the long data into wide data, it is necessary to sort the data on the variables that uniquely identify the wide records (from the previous example, the variable id). If your data file is already sorted on that variable, it is a waste of disk space and time to sort the file again, so you can tell the %towide macro that your data are already sorted using the sorted=yes option. This suppresses the sorting of the data. Please note that using the sorted=yes option when the data are not sorted will lead to unexpected (and likely wrong) results. An example is shown below.

< data step creating long4 from above would go here.>
< it is omitted to save space > 

PROC SORT DATA=long4 OUT=long4s; 
  BY id; 
RUN; 
%towide(long4s, wide4c, id, year, 90, 95, inc, sorted=yes);   

8. Reshaping character variables

The %towide macro assumes that all of your wide variables to be converted to long are numeric variables. Suppose you have some character variables as well, as in the example below.

   FAMID    YEAR      FAMINC   SPEND    DEBT 
      1       96      40000    38000    yes 
      1       97      40500    39000    yes 
      1       98      41000    40000    no 
      2       96      45000    42000    yes 
      2       97      45400    43000    no 
      2       98      45800    44000    no 
      3       96      75000    70000    no 
      3       97      76000    71000    no 
      3       98      77000    72000    no 

The variable faminc is numeric and the variable spend is numeric, but the variable debt is character. The example below shows how to handle reshaping with character variables using the types= option.

As you see below, the types=N N C option is used to indicate that the first variable (faminc) is numeric, the second variable (spend) is numeric and the third variable (debt) is character.

* 5. This example shows how to handle character variables ; 
data long5; 
  length debt $ 3; 
  input famid year faminc spend debt $ ; 
cards; 
1 96 40000 38000 yes 
1 97 40500 39000 yes 
1 98 41000 40000 no 
2 96 45000 42000 yes 
2 97 45400 43000 no 
2 98 45800 44000 no 
3 96 75000 70000 no 
3 97 76000 71000 no 
3 98 77000 72000 no 
; 
run; 

%towide(long5, wide5, famid, year, 96, 98, faminc spend debt, types=N N C); 

We show just the output of the proc print (parts 1A and 1B below) showing that the character variables were properly reshaped as well as the numeric variables.

1A. Verify reshaping of long to wide using PROC PRINT of 10 wide records. 
    Compare this print of the long data below (long5) with the 
    PROC PRINT of the wide data (wide5) in step 1B below. 
    
    PROC PRINT of LONG data file long5 

OBS    FAMID    YEAR    FAMINC    SPEND    DEBT 

 1       1       96      40000    38000    yes 
 2       1       97      40500    39000    yes 
 3       1       98      41000    40000    no 
 4       2       96      45000    42000    yes
 5       2       97      45400    43000    no 
 6       2       98      45800    44000    no 
 7       3       96      75000    70000    no 
 8       3       97      76000    71000    no 
 9       3       98      77000    72000    no 

1B. Verify reshaping of long to wide using PROC PRINT of 10 wide records 
    Compare this print of the wide data (wide5) below with the 
    PROC PRINT of the long data (long5) in Step 1A above 

PROC PRINT of WIDE data file wide5 

            F        F        F 
            A        A        A        S        S        S 
            M        M        M        P        P        P       D      D     D 
     F      I        I        I        E        E        E       E      E     E 
     A      N        N        N        N        N        N       B      B     B 
O    M      C        C        C        D        D        D       T      T     T 
B    I      9        9        9        9        9        9       9      9     9 
S    D      6        7        8        6        7        8       6      7     8 

1    1    40000    40500    41000    38000    39000    40000    yes    yes    no 
2    2    45000    45400    45800    42000    43000    44000    yes    no     no 
3    3    75000    76000    77000    70000    71000    72000    no     no     no

9. Reshaping character variables wider than eight characters

Consider the example data file below. It is just like the previous example, except the character variable is city instead of debt, and some of the values of city are longer than 8 characters. The %towide macro assumes that the character variables are a length of eight or shorter unless you tell it otherwise. We will intentionally omit this option to show how your character variables can get truncated.

FAMID    YEAR    FAMINC    SPEND    CITY 

    1       96      40000    38000    Sacramento 
    1       97      40500    39000    Sacramento 
    1       98      41000    40000    Placerville
    2       96      45000    42000    Denver 
    2       97      45400    43000    Denver 
    2       98      45800    44000    Seattle
    3       96      75000    70000    Malibu 
    3       97      76000    71000    Ventura 
    3       98      77000    72000    Ventura

As you see below, the types=N N C option is used to indicate that the first variable (faminc) is numeric, the second variable (spend) is numeric and the third variable (debt) is character.

data wide6; 
  length city96 city97 city98 $ 12 ; 

  input famid faminc96 faminc97 faminc98 
        spend96 spend97 spend98 city96 $ city97 $ city98 $ ; 
cards; 
1 40000 40500 41000 38000 39000 40000 Sacramento Sacramento Placerville 
2 45000 45400 45800 42000 43000 44000 Denver     Denver     Seattle 
3 75000 76000 77000 70000 71000 72000 Malibu     Ventura    Ventura 
; 
run; 

%towide(wide6,long6,famid,year,96,98,faminc spend city,types=N N C); 

We show just the output of the proc print for the wide data showing how the character variables for city were truncated to a length of eight.

        F      F      F 
        A      A      A      S      S      S 
        M      M      M      P      P      P       C         C         C 
   F    I      I      I      E      E      E       I         I         I 
   A    N      N      N      N      N      N       T         T         T 
O  M    C      C      C      D      D      D       Y         Y         Y 
B  I    9      9      9      9      9      9       9         9         9 
S  D    6      7      8      6      7      8       6         7         8 

1  1  40000  40500  41000  38000  39000  40000  Sacramen  Sacramen  Placervi 
2  2  45000  45400  45800  42000  43000  44000  Denver    Denver    Seattle 
3  3  75000  76000  77000  70000  71000  72000  Malibu    Ventura   Ventura

As you see below, the lengths=8 8 12 option is used to indicate that the first two variables (faminc and spend) should be a length of eight and the third variable (city) should be a length of 12. When using the lengths= option it is safest to supply a value of eight for all numeric variables.

* 7. This example shows how to use the lengths option to say that ; 
*    the character variables can be up to 12 characters in length ; 

data long7; 
  length city $ 12 ; 
  input famid year faminc spend city $ ; 
cards; 
1 96 40000 38000 Sacramento 
1 97 40500 39000 Sacramento 
1 98 41000 40000 Placerville 
2 96 45000 42000 Denver 
2 97 45400 43000 Denver 
2 98 45800 44000 Seattle 
3 96 75000 70000 Malibu 
3 97 76000 71000 Ventura 
3 98 77000 72000 Ventura 
; 
run;

%towide(long7,wide7,famid,year,96,98,faminc spend city,types=N N C,lengths=8 8 12); 

We show just the output of the proc print for the long data showing how the character variables for city were properly reshaped by using the lengths= option.

      F     F     F 
      A     A     A     S     S     S 
      M     M     M     P     P     P   C          C               C 
  F   I     I     I     E     E     E   I          I               I 
  A   N     N     N     N     N     N   T          T               T 
O M   C     C     C     D     D     D   Y          Y               Y 
B I   9     9     9     9     9     9   9          9               9 
S D   6     7     8     6     7     8   6          7               8 

1 1 40000 40500 41000 38000 39000 40000 Sacramento Sacramento Placerville 
2 2 45000 45400 45800 42000 43000 44000 Denver     Denver     Seattle 
3 3 75000 76000 77000 70000 71000 72000 Malibu     Ventura    Ventura

10. Summary

The %towide macro can be used to reshape data from long format to wide format for many basic situations. An example of the syntax is given below.

data long1; 
  input famid year faminc ; 
cards; 
1 96 40000 
1 97 40500 
1 98 41000 
2 96 45000 
2 97 45400 
2 98 45800 
3 96 75000 
3 97 76000 
3 98 77000 
; 
run; 

%towide(long1,wide1,famid,year,96,98,faminc); 

Where
long1 - is the input long data set
wide1 - is the output wide data set
famid - is the variable that uniquely identifies the wide records.
year - is the variable that will be used for the suffixes of the wide variables
96 - is the lowest value of year
98 - is the highest value of year
faminc - is the name of the variable to be reshaped from wide to long.

The towide macro also supports the following options (illustrated by example)
numprint=5 To print only 5 wide records (default is 10)
quiet=yes To suppress printing of proc print and proc means
sorted=yes The long data file is already sorted by the variable that uniquely
identifies the wide records
types=N N C States that the first 2 variables are numeric and the 3rd is character.
lengths= 8 8 12 States that the length of the first 2 variables is 8, and the 3rd is 12.

Here are examples of the options:

This indicates the first two variables are numeric and the third is character.

%towide(long7,wide7,famid,year,96,98,faminc spend city,types=N N C); 

This indicates the first two variables are numeric and the third is character and the length of the character variable is 12 (and the lengths of the first two variables (the numeric variables) should be the default width of eight.

%towide(long7,wide7,famid,year,96,98,faminc spend city,types=N N C,lengths=8 8 12); 

This requests only five of the wide records (and the corresponding number of long records) be printed to verify the reshaping.

%towide(long7,wide7,famid,year,96,98,faminc spend,numprint=5); 

This requests suppression of the proc print and proc means for verifying the reshaping. This option is not recommended, but provided for the convenience of those who may be performing their own verification routines.

%towide(long7,wide7,famid,year,96,98,faminc spend,quiet=yes); 

This indicates that the input long data file is already sorted by famid, and suppresses the sorting by famid again.

%towide(long7,wide7,famid,year,96,98,faminc spend,sorted=yes); 

11. Problems to look out for

These examples cover situations where there are no complications. However, look out for these complications.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.