UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ
How do I read multiple raw data files with the same structure in one data step?

Let's say that we have multiple raw data files in a folder with the same data structure and we need to read them into SAS to form a single SAS data set. This can actually be done in SAS in a single data step. Here is an example demonstrating the steps to accomplish that for Windows operating system environment. There are mainly two steps. Step one is to create a file consisting of all the file names. Step two is the SAS data step to create the SAS data file based on the text file of file names created in the first step.

To set up our example, we have created some mock data files in a folder called raw_data_files and the folder is located in c:\work directory. Here is all the files in the directory:

  1. Creating a text file consisting of all the file names in the folder using DOS commands via Command window. You can open a Command window by choosing "Run" from the Start menu. Enter "cmd" in the field for "Open" and then click on OK. Type "cd c:\work" to change to the c:\work directory. Below is a sequence of commands that are used to create a text file called filenames.txt which contains all the three file names and their path.
  2. C:\work>cd raw_data_files
    C:\work\raw_data_files>dir
     Volume in drive C is Local Disk
     Volume Serial Number is A017-4A89
     Directory of C:\work\raw_data_files
    11/19/2006  10:11a      <DIR>          .
    11/19/2006  10:11a      <DIR>          ..
    11/19/2006  09:57a                  45 file01.txt
    11/19/2006  09:58a                  46 file3.txt
    11/19/2006  09:59a                  63 file7.txt
                   3 File(s)            154 bytes
                   2 Dir(s)  21,162,877,440 bytes free
    C:\work\raw_data_files>more file01.txt
    John 12 354 7
    Carl 43 657 9
    Mary 343 7 9
    C:\work\raw_data_files>more file3.txt
    adam 12 354 7
    brad 43 657 9
    tyler 343 7 9
    C:\work\raw_data_files>more file7.txt
    mary 343 56 2
    robert 243 67 8
    brad 43 657 9
    tyler 343 7 9
    C:\work\raw_data_files>dir /s /b > ../filenames.txt
    C:\work\raw_data_files>cd ..
    C:\work>more filenames.txt
    C:\work\raw_data_files\file01.txt
    C:\work\raw_data_files\file3.txt
    C:\work\raw_data_files\file7.txt

    Notice that we created the file filenames.txt not in the current directory but in the directory one level above. This allows us to only include the file names in the current directory to be saved.

     

  3. Now we are ready to proceed to SAS. In one data step, we read in all the files. The trick is to have TWO infile statements. The first one is for reading a file name and the second one is to read in the data from each individual file with the filevar options and the end option. Corresponding to each of the infile statement, we also have two input statements.  The first input statement is for reading the file name, so it only has one entry, namely, the file name to be used in the second infile statement. The second input statement corresponds to the data structure of the data files.
options nocenter nodate;
data one;
  infile "c:\work\filenames.txt";
  length fil2read  $100;
  input fil2read $;
  infile dummy filevar=fil2read end=done ;
  do while(not done);
    file = _n_;
    input name $  x1 x2 x3;  
    output;
  end;  
run;
proc print data=one;
run;
Obs    file     name      x1     x2    x3
  1      1     John       12    354     7
  2      1     Carl       43    657     9
  3      1     Mary      343      7     9
  4      2     adam       12    354     7
  5      2     brad       43    657     9
  6      2     tyler     343      7     9
  7      3     mary      343     56     2
  8      3     robert    243     67     8
  9      3     brad       43    657     9
 10      3     tyler     343      7     9

We have also created a variable called file to group the observations by each of the raw data files. We can also  be more specific by defining the variable file to be fil2read.

data one;
  infile "c:\work\filenames.txt";
  length fil2read  $100;
  input fil2read $;
  infile dummy filevar=fil2read end=done ;
  do while(not done);
    file = fil2read;
    input name $  x1 x2 x3;  
    output;
  end;  
run;
proc print data=one;
run;
Obs                  file                    name      x1     x2    x3
  1    C:\work\raw_data_files\file01.txt    John       12    354     7
  2    C:\work\raw_data_files\file01.txt    Carl       43    657     9
  3    C:\work\raw_data_files\file01.txt    Mary      343      7     9
  4    C:\work\raw_data_files\file3.txt     adam       12    354     7
  5    C:\work\raw_data_files\file3.txt     brad       43    657     9
  6    C:\work\raw_data_files\file3.txt     tyler     343      7     9
  7    C:\work\raw_data_files\file7.txt     mary      343     56     2
  8    C:\work\raw_data_files\file7.txt     robert    243     67     8
  9    C:\work\raw_data_files\file7.txt     brad       43    657     9
 10    C:\work\raw_data_files\file7.txt     tyler     343      7     9
[http://www.ats.ucla.edu/stat/sas/footer.htm]