UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ
How do I read raw data via FTP in SAS?

SAS has the ability to read raw data directly from FTP servers. Normally, you would use FTP to download the data to your local computer and then use SAS to read the data stored on your local computer. SAS allows you to bypass the FTP step and read the data directly from the other computer via FTP without the intermediate step of downloading the raw data file to your computer. Of course, this assumes that you can reach the computer via the internet at the time you run your SAS program. The program below illustrates how to do this. After the filename in you put ftp to tell SAS to access the data via FTP. After that, you supply the name of the file (in this case 'gpa.txt'. lrecl= is used to specify the width of your data. Be sure to choose a value that is at least as wide as your widest record. cd= is used to specify the directory from where the file is stored. host= is used to specify the name of the site to which you want to FTP.  user= is used to provide your userid (or anonymous if connecting via anonymous FTP). pass= is used to supply your password (or your email address if connecting via anonymous FTP).

FILENAME in FTP 'gpa.txt' LRECL=80 
                CD='/local2/samples/sas/ats/' 
                HOST='cluster.oac.ucla.edu'
                USER='joebruin'
                PASS='yourpassword' ;
DATA gpa ;
   INFILE in ;
   INPUT gpa hsm hss hse satm satv gender ;
RUN;
 
PROC PRINT DATA=gpa(obs=10) ;
RUN;

As you see below, the program read the data in gpa.txt successfully

OBS     GPA    HSM    HSS    HSE    SATM    SATV    GENDER

  1    5.32     10     10     10     670     600       1
  2    5.14      9      9     10     630     700       2
  3    3.84      9      6      6     610     390       1
  4    5.34     10      9      9     570     530       2
  5    4.26      6      8      5     700     640       1
  6    4.35      8      6      8     640     530       1
  7    5.33      9      7      9     630     560       2
  8    4.85     10      8      8     610     460       2
  9    4.76     10     10     10     570     570       2
 10    5.72      7      8      7     550     500       1

The log shows that we read 40 records and 7 variables, confirming that we read the data correctly. Since it is possible you could lose your FTP connection and only get part of the data, it is extra important to check the log to see how many observations and variables you read, and to compare that to how many observations and variables you believe the file to have.

NOTE: 40 records were read from the infile IN.
      The minimum record length was 25.
      The maximum record length was 25.
NOTE: The data set WORK.GPA has 40 observations and 7 variables.

In your program, be sure to change the lrecl=80 to be the width of your raw data file. If you are unsure of how wide the file is, just use a value that is certainly wider than the widest line of your file. You would most likely use this technique when you are reading a very large file. You can test your program by just reading a handful of observations by using the obs= parameter on the infile statement, e.g., infile in obs=20;
would read just the first 20 observations from your file.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.