UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ
Is there a quick way to create dummy variables?

Converting a categorical variable to dummy variables can be a tedious process when done using a series of series of if then statements. Consider the following example data file.

DATA auto ;
  LENGTH make $ 20 ;
  INPUT make $ 1-17 price mpg rep78 ;
CARDS;
AMC Concord        4099 22 3 
AMC Pacer          4749 17 3 
Audi 5000          9690 17 5 
Audi Fox           6295 23 3 
BMW 320i           9735 25 4 
Buick Century      4816 20 3 
Buick Electra      7827 15 4 
Buick LeSabre      5788 18 3 
Cad. Eldorado     14500 14 2 
Olds Starfire      4195 24 1 
Olds Toronado     10371 16 3 
Plym. Volare       4060 18 2 
Pont. Catalina     5798 18 4 
Pont. Firebird     4934 18 1 
Pont. Grand Prix   5222 19 3 
Pont. Le Mans      4723 19 3 
;
RUN;

The variable rep78 is coded with values from 1 - 5 representing various repair histories. We may create dummy variables for rep78 by writing separate assignment statements for each value as follows:

DATA auto2 ;
  SET auto ;
 
  IF rep78 = 1 THEN rep78_1 = 1; 
    ELSE rep78_1 = 0;
  IF rep78 = 2 THEN rep78_2 = 1; 
    ELSE rep78_2 = 0;
  IF rep78 = 3 THEN rep78_3 = 1; 
    ELSE rep78_3 = 0;
  IF rep78 = 4 THEN rep78_4 = 1; 
    ELSE rep78_4 = 0;
  IF rep78 = 5 THEN rep78_5 = 1; 
    ELSE rep78_5 = 0;
RUN;
 
PROC FREQ DATA=auto2;
  TABLES rep78*rep78_1*rep78_2*rep78_3*rep78_4*rep78_5 / list ;
RUN;

As you see from the proc freq below, the dummy variables were properly created, but it required a lot of if then else statements.

[Output below edited for readability] 
REP78 REP78_1 REP78_2 REP78_3 REP78_4 REP78_5  Freq  Percent
------------------------------------------------------------
    1       1       0       0       0       0    2    12.5  
    2       0       1       0       0       0    2    12.5  
    3       0       0       1       0       0    8    50.0  
    4       0       0       0       1       0    3    18.8  
    5       0       0       0       0       1    1     6.3   

Had rep78 ranged from 1 to 10 or 1 to 20, that would be a lot of typing (and prone to error). Here is a shortcut you could use when you need to create dummy variables.

DATA auto3;
  set auto;
 
  ARRAY dummys {*} 3.  rep78_1 - rep78_5;
 
  DO i=1 TO 5;			      
    dummys(i) = 0;
  END;
  dummys( rep78  ) = 1;		
 
RUN;
 
PROC FREQ DATA=auto3;
  TABLES rep78*rep78_1*rep78_2*rep78_3*rep78_4*rep78_5 / list ;
RUN;

As you see below, the dummy variables were created successfully.

  [Output below edited for readability] 
REP78  REP78_1  REP78_2  REP78_3  REP78_4  REP78_5  Freq  Percent
-----------------------------------------------------------------
    1        1        0        0        0        0    2    12.5 
    2        0        1        0        0        0    2    12.5 
    3        0        0        1        0        0    8    50.0 
    4        0        0        0        1        0    3    18.8 
    5        0        0        0        0        1    1     6.3  

Let's look at each statement in some detail.

 ARRAY dummys {*} 3. rep78_1 - rep78_5;

This statement defines an array called dummys that creates five dummy variables rep78_1 to rep78_5 giving each the minimum storage length required, i.e., 3 bytes.  You would change rep78_1 to rep78_5 to be the names you want for your dummy variables.  The asterisk in the brackets tells SAS to automatically count up the number of new variables based on the number of variables listed at the end of the statement.

DO i=1 TO 5;			      
  dummys(i) = 0;
END;

This initialized each dummy variable to 0. You would change 5 to be the number values your variable could have.

dummys(rep78) = 1;		

Set the appropriate dummy variable to 1. For example, if rep78 = 3, then dummys(dummys( rep78 ) = 1 will assign a value of 1 to the third element in the array, i.e., assign 1 to rep78_3.  You would change rep78 to the name of the variable for which you want to create dummy variables.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California