SAS FAQ 
How do I read in a delimited ASCII file in SAS?

SAS proc import is usually a good starting point for reading a delimited ASCII data file, such as a .csv (comma-separated values) file or a tab-delimited file. Sometimes we can also use a data step to read in an ASCII data file. On this page, we will show examples on how to read delimited ASCII files using proc import and data step.


Comma-separated files

It is a common practice to read in comma-separated files. SAS proc import is usually sufficient for this purpose. There are two slightly different ways of reading a comma delimited file using proc import. In SAS, a comma delimited file can be considered as a special type of external file with special file extension .csv, which stands for comma-separated-values. We show here a sample program making use of this feature using data found in cars_novname.csv

Acura,MDX,SUV,Asia,All,"$36,945 ","$33,337 ",3.5,6,265,17,23,4451,106,189
Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820 ","$21,761 ",2,4,200,24,31,2778,101,172
Acura,TSX 4dr,Sedan,Asia,Front,"$26,990 ","$24,647 ",2.4,4,200,22,29,3230,105,183
Acura,TL 4dr,Sedan,Asia,Front,"$33,195 ","$30,299 ",3.2,6,270,20,28,3575,108,186
Acura,3.5 RL 4dr,Sedan,Asia,Front,"$43,755 ","$39,014 ",3.5,6,225,18,24,3880,115,197

Then the following proc import statement will read it in and create a temporary data set called mydata. Since the .csv file does not include variable names, we indicate getnames = no.  After reading in the data, we can look at a proc contents to see the formats used in our dataset.

proc import datafile="cars_novname.csv" out=mydata dbms=csv replace;
    getnames=no;
run;

proc contents data=mydata;
run;

 #    Variable    Type    Len    Format     Informat
 1    VAR1        Char      5    $5.        $5.
 2    VAR2        Char     39    $39.       $39.
 3    VAR3        Char      7    $7.        $7.
 4    VAR4        Char      7    $7.        $7.
 5    VAR5        Char      6    $6.        $6.
 6    VAR6        Char     10    $10.       $10.
 7    VAR7        Char     10    $10.       $10.
 8    VAR8        Num       8    BEST12.    BEST32.
 9    VAR9        Num       8    BEST12.    BEST32.
10    VAR10       Num       8    BEST12.    BEST32.
11    VAR11       Num       8    BEST12.    BEST32.
12    VAR12       Num       8    BEST12.    BEST32.
13    VAR13       Num       8    BEST12.    BEST32.
14    VAR14       Num       8    BEST12.    BEST32.
15    VAR15       Num       8    BEST12.    BEST32.

As you can see in the output above, SAS creates default variable names as VAR1-VARn when variables names are not present in the raw data file.

You might have a file where you have the names at the top of the file like cars.csv. With such a file you would like SAS to use the variable names from the file (e.g., make mpg etc.).  

Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length
Acura,MDX,SUV,Asia,All,"$36,945 ","$33,337 ",3.5,6,265,17,23,4451,106,189
Acura,RSX Type S 2dr,Sedan,Asia,Front,"$23,820 ","$21,761 ",2,4,200,24,31,2778,101,172
Acura,TSX 4dr,Sedan,Asia,Front,"$26,990 ","$24,647 ",2.4,4,200,22,29,3230,105,183
Acura,TL 4dr,Sedan,Asia,Front,"$33,195 ","$30,299 ",3.2,6,270,20,28,3575,108,186

We can use the getnames=yes; statement to tell SAS we want it to read the variable names from the first line of the data file, as illustrated below.

proc import datafile="cars.csv" out=mydata dbms=csv replace;
   getnames=yes;
run;

Another way of reading a comma delimited file is to consider a comma as an ordinary delimiter. Here is a program that shows how to use the dbms=dlm and delimiter="," option to read a file just like we did above. Also notice that the external file doesn't have to have .csv extension.

proc import datafile="cars.csv" out=mydata dbms=dlm replace;
   delimiter=",";
   getnames=yes;
run;

Tab-delimited files

The options available for reading in a .csv file in proc import also exist for tab-delimited files: you can opt to read in or not read in names from your file; you can treat tab-delimited files as a special type of external file with extension .txt of your can treat your file as an instance of a delimited file and describe the delimiter.

We first show sample a program for reading in a tab delimited file that does not include variable names. SAS creates default variable names as VAR1-VARn in when we do not provide variable names.

proc import datafile="cars.txt" out=mydata dbms=tab replace;
   getnames=no;
run;

Next, if your tab-delimited file contains variable names, you change the getnames option.

proc import datafile="cars.txt" out=mydata dbms=tab replace;
   getnames=yes;
run;

Or if you wish to treat your tab-delimited file with variable names as a generic "delimited" file, you change the dbms option and indicate what the delimiter is.

proc import datafile="cars.txt" out=mydata dbms=dlm replace;
   delimiter='09'x;
   getnames=yes;
run;

You may want to create a permanent SAS data file using proc import. Suppose that we want to create a permanent SAS data file called mydata in the directory "c:\dissertation". We can do the following. 

libname dis "c:\dissertation";
proc import datafile="cars.txt" out=dis.mydata dbms=dlm replace;
   delimiter='09'x;
   getnames=yes;
run;

Space-delimited files

Proc import also work for space-delimited files. When dbms = dlm is indicated, SAS assumes the file is space-delimited unless otherwise specified (as we did above for commas and tabs). 

We first show sample a program for reading in a space-delimited file that does not include variable names. SAS creates default variable names as VAR1-VARn in when we do not provide variable names.

proc import datafile="cars_sp.txt" out=mydata dbms=dlm  replace;
  getnames=no;
run;

Next, if your space-delimited file contains variable names, you change the getnames option.

proc import datafile="cars_sp.txt" out=mydata dbms=dlm  replace;
  getnames=yes;
run;

If you wish to create a permanent SAS data file using proc import in the directory "c:\dissertation", you can use the code below. 

libname dis "c:\dissertation";
proc import datafile="cars_sp.txt" out=dis.mydata dbms=dlm replace;
   getnames=yes;
run;

Other kinds of delimiters

You can use delimiter= on the infile statement to tell SAS what delimiter you are using to separate variables in your raw data file. For example, below we have a raw data file that uses exclamation points ! to separate the variables in the file.

22!2930!4099
17!3350!4749
22!2640!3799
20!3250!4816
15!4080!7827

The example below shows how to read this file by using delimiter='!' on the infile statement.

DATA cars;
  INFILE 'readdel1.txt' DELIMITER='!' ;
  INPUT mpg weight price;
RUN;
 
PROC PRINT DATA=cars;
RUN;

As you can see in the output below, the data was read properly.

OBS    MPG    WEIGHT    PRICE

 1      22     2930      4099
 2      17     3350      4749
 3      22     2640      3799
 4      20     3250      4816
 5      15     4080      7827

It is possible to use multiple delimiters. The example file below uses either exclamation points or plus signs as delimiters.

22!2930!4099
17+3350+4749
22!2640!3799
20+3250+4816
15+4080!7827

By using delimiter='!+' on the infile statement, SAS will recognize both of these as valid delimiters.

DATA cars;
  INFILE 'readdel2.txt' DELIMITER='!+' ;
  INPUT mpg weight price;
RUN;
 
PROC PRINT DATA=cars;
RUN;

As you can see in the output below, the data was read properly.

OBS    MPG    WEIGHT    PRICE

 1      22     2930      4099
 2      17     3350      4749
 3      22     2640      3799
 4      20     3250      4816
 5      15     4080      7827

Problem with Proc Import

Proc import does not know the formats for your variables, but it is able to guess the format based on what the beginning of your dataset looks like. Most of the time, this guess is fine. But if the length of a variable differs from beginning to end of your file, you might end up with some truncated values.

Let's return to the first example shown on this page.  We read in a .csv file and then looked at the contents of the SAS dataset we generated.  The format assumed for the first variable was a character string of length 5. This is consistent with the values seen at the beginning of this variable listing the makes of cars, sorted alphabetically ("Acura", "Audi", "BMW"). However, if we look at a frequency table of this variable, we can see that there are longer values that are being truncated. 

                                  Cumulative    Cumulative
VAR1     Frequency     Percent     Frequency      Percent
Acura           7        1.64             7         1.64
Audi           19        4.44            26         6.07
BMW            20        4.67            46        10.75
Buick           9        2.10            55        12.85
Cadil           8        1.87            63        14.72
Chevr          27        6.31            90        21.03
Chrys          15        3.50           105        24.53
Dodge          13        3.04           118        27.57
Ford           23        5.37           141        32.94
GMC             8        1.87           149        34.81
Honda          17        3.97           166        38.79
Humme           1        0.23           167        39.02
Hyund          12        2.80           179        41.82
Infin           8        1.87           187        43.69
Isuzu           2        0.47           189        44.16
Jagua          12        2.80           201        46.96
Jeep            3        0.70           204        47.66
Kia            11        2.57           215        50.23
Land            3        0.70           218        50.93
Lexus          11        2.57           229        53.50
Linco           9        2.10           238        55.61
MINI            2        0.47           240        56.07
Mazda          11        2.57           251        58.64
Merce          26        6.07           277        64.72
Mercu           9        2.10           286        66.82
Mitsu          13        3.04           299        69.86
Nissa          17        3.97           316        73.83
Oldsm           3        0.70           319        74.53
Ponti          11        2.57           330        77.10
Porsc           7        1.64           337        78.74
Saab            7        1.64           344        80.37
Satur           8        1.87           352        82.24
Scion           2        0.47           354        82.71
Subar          11        2.57           365        85.28
Suzuk           8        1.87           373        87.15
Toyot          28        6.54           401        93.69
Volks          15        3.50           416        97.20
Volvo          12        2.80           428       100.00

This may not be an enormous problem here--we can still understand the variable, even in its truncated form--but if there were different values that began with the same 5 characters, we would no longer se them as different values (e.g., "South Carolina" and "South Dakota" would both be "South").  How can you fix this?

When you run proc import, code for a data step is generated behind the scenes and that is what is actually run to read in your data. This data step can be found in the SAS log after running proc import and can be copied, amended, and rerun. We've pasted the data step from the log and marked in bold the lines we would amend to fix this problem variable. The "$5." would be replaced with something safer, like "$20.".

      data WORK.MYDATA                                  ;
      %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
      infile 'C:\cars_novname.csv' delimiter = ',' MISSOVER DSD lrecl=32767 ;
         informat VAR1 $5. ;
         informat VAR2 $39. ;
         informat VAR3 $7. ;
         informat VAR4 $7. ;
         informat VAR5 $6. ;
         informat VAR6 $10. ;
         informat VAR7 $10. ;
         informat VAR8 best32. ;
         informat VAR9 best32. ;
         informat VAR10 best32. ;
         informat VAR11 best32. ;
         informat VAR12 best32. ;
         informat VAR13 best32. ;
         informat VAR14 best32. ;
         informat VAR15 best32. ;
         format VAR1 $5. ;
         format VAR2 $39. ;
         format VAR3 $7. ;
         format VAR4 $7. ;
         format VAR5 $6. ;
         format VAR6 $10. ;
         format VAR7 $10. ;
         format VAR8 best12. ;
         format VAR9 best12. ;
         format VAR10 best12. ;
         format VAR11 best12. ;
         format VAR12 best12. ;
         format VAR13 best12. ;
         format VAR14 best12. ;
         format VAR15 best12. ;
      input
                  VAR1 $
                  VAR2 $
                  VAR3 $
                  VAR4 $
                  VAR5 $
                  VAR6 $
                  VAR7 $
                  VAR8
                  VAR9
                  VAR10
                  VAR11
                  VAR12
                  VAR13
                  VAR14
                  VAR15
      ;
      if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
      run;

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.