UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

SAS FAQ 
How do I read a delimited file with missing data in SAS?

It is very convenient to read comma delimited, tab delimited, or other kinds of delimited raw data files. However, you need to be very careful when reading delimited data with missing values. Consider the example raw data file below. Note that the value of mpg is missing for the AMC Pacer and the missing value is signified with two consecutive commas (,,).

AMC Concord,22,2930,4099
AMC Pacer,,3350,4749
AMC Spirit,22,2640,3799
Buick Century,20,3250,4816
Buick Electra,15,4080,7827

We read the file using the program below using delimiter=',' to indicate that commas are used as delimiters.

 DATA cars1;
  length make $ 20 ;
  INFILE 'readdsd.txt' DELIMITER=',' ;
  INPUT make mpg weight price;
RUN;
 
PROC PRINT DATA=cars1;
RUN;

But, as we see below, the data was read incorrectly for the AMC Pacer.

OBS        MAKE          MPG    WEIGHT    PRICE
 1     AMC Concord        22     2930      4099
 2     AMC Pacer        3350     4749         .
 3     Buick Century      20     3250      4816
 4     Buick Electra      15     4080      7827

SAS does not properly recognize empty values for delimited data unless you use the dsd option. You need to use the dsd option on the infile statement if two consecutive delimiters are used to indicate missing values (e.g., two consecutive commas, two consecutive tabs). Below, we read the exact same file again, except that we use the dsd option.

 DATA cars2;
  length make $ 20 ;
  INFILE 'readdsd.txt' DELIMITER=',' DSD ;
  INPUT make mpg weight price;
RUN;
 
PROC PRINT DATA=cars2;
RUN;

The output is shown below.

OBS        MAKE         MPG    WEIGHT    PRICE
 1     AMC Concord       22     2930      4099
 2     AMC Pacer          .     3350      4749
 3     AMC Spirit        22     2640      3799
 4     Buick Century     20     3250      4816
 5     Buick Electra     15     4080      7827

As you see in the output, the data for the AMC Pacer was read correctly because we used the dsd option.


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.