This article was originally published in Perspective, Volume 19, Number 4, 1995, pp. 21-28.


Using SAS in Unix: Host Specific Features on the SP2/Cluster

by Peter M. Saama, Ph.D.

Introduction

The implementation of SAS in AIX and MVS environments differs remarkably. Consequently, SAS users migrating from the ES/9000 to the SP2/cluster need to be aware of the host-specific features of the production release of the SAS system under AIX.

Generally, all statements in a SAS program consist of two kinds of components: DATA steps and PROCedure steps, the building blocks of all SAS programs. A SAS command file or program is made up of either a DATA step or a PROC step, or both. DATA and PROC steps can appear in any order, and any number of DATA or PROC steps can be used in a SAS program. SAS statements usually begin with a keyword and always end with a semicolon (;).

These features of SAS are the same across host systems. The most distinguishing host-specific feature is how SAS handles data in external files. This article discusses SAS LIBNAME and FILENAME statements needed to access external files on the SP2/cluster complex. Host-dependent features of SAS are presented, including use of environment variables.

SAS Data Libraries

In contrast to MVS, where a SAS library is a partitioned data set, a SAS library under AIX is a directory. Members of the library are stored as individual files in that directory and have the ending

     ssdnn

where

nn is a two-digit suffix that SAS adds to the file extension in order to identify compatible SAS data members. On the SP2/cluster, this suffix will always be '01'.

Figure 1 shows a sketch of a hypothetical directory structure on AIX. Directories enable you to organize your files in a hierarchical structure. Each of the directories under the home directory (which can be referred to using a tilde, '~') is a valid SAS data library. For example, the SAS data library could be a directory called '~/sas/class' with the members 'winners', 'losers', and 'totals'. Their full pathnames would be:

       ~/sas/class/winners.ssd01
       ~/sas/class/losers.ssd01
       ~/sas/class/totals.sssd01

The SAS LIBNAME Statement

The SAS LIBNAME statement is used to identify SAS data libraries to be accessed in a SAS session or in a SAS job. The general syntax of a LIBNAME statement on AIX is:

LIBNAME libref <engine> 'directory';

where:

libref is a reference name for a SAS data library;

<engine> is a component of the SAS system that reads from or writes to a file. Each engine allows the SAS system to access files with a particular format. Valid keywords for which are supported on AIX are shown in the first column of Table 1;

directory is the directory (SAS library) containing the SAS library members you want to access, or the directory that will contain the members of the SAS data library you are creating.

The V609 engine is the default on AIX and provides write access to the current form of a SAS data library (release 6.09) as well as read access to SAS data files created by earlier releases. If you omit an engine name on the LIBNAME statement, the SAS system looks at the extensions of the files in the given directory and determines the appropriate engine.

Figure 2 shows samples of SAS LIBNAME statements for the default engine. For syntax related to the other engines see SAS Companion for UNIX Environments: Language.

In example A, the LIBNAME statement associates the libref 'in1' with the current working directory (.). The current working directory contains the SAS library members you will be accessing or creating.

In example B, the LIBNAME statement associates the libref 'in2' with your home directory (~).

In example C, the LIBNAME statement associates the libref 'in3' with a directory named '~/sas/class'. The directory must exist on the file system. SAS will not create it for you.

In example D, the LIBNAME statement associates the libref 'in4' with the environment variable 'MYSASLIB'. In the default C shell, the environment variable 'MYSASLIB' is created by typing:

     setenv MYSASLIB ~/sas/class

The equivalent syntax for the Korn shell is:

     export MYSASLIB=~/sas/class

The environment variable can also be used as a reference name for a SAS library. This is useful if many of your SAS programs access library members which are in the same directory. Since no engine can be specified when you associate a libref with an environment variable, the SAS system assigns one when the library is accessed.

The following statement uses the environment variable 'MYSASLIB' as a libref to access the SAS library '~/sas/class'. The SAS library contains a member called 'winners' with the pathname '~/sas/class/winners.ssd01':

     PROC PRINT DATA=MYSASLIB.winners;

As a general rule environment variables which are used to reference SAS libraries cannot include lowercase letters and the variable value must be a directory. Environment variables with names that exceed eight (8) characters are easy to create but can only be used on the LIBNAME statement. We recommend that you assign variable names that do not exceed eight (8) characters in length.

The SAS FILENAME Statement

You must explicitly issue the FILENAME statement for external files, such as 'flat files' containing data. The general form of the FILENAME statement on AIX is:

FILENAME fileref <device-type> 'pathname' <options>;

where:

fileref is a reference name for an AIX file or device;

<device-type> is a device such as a disk, terminal, printer, or pipe. Valid keywords for device types which are supported on AIX are shown in Table 2;

pathname is a fully qualified name for an AIX file or a device;

<options> control how the external file is processed and include keywords for the record length, block size, and record format. For syntax related to <options>, see SAS Companion for UNIX Environments: Language.

DISK is the default device type. Sample FILENAME statements for the default device type are shown in Figure 3. For syntax related to the other device types, see SAS Companion for UNIX Environments: Language.

In example A, the FILENAME statement associates the fileref 'indata1' with the file 'gpa.rawdata' stored in the current working directory (.).

In example B, the FILENAME statement associates the fileref 'indata2' with the file 'gpa.rawdata' stored in your home directory (~).

In example C, the FILENAME statement associates the fileref 'indata3' with the file 'gpa.rawdata' stored in an existing directory named '~/sas/class'.

In example D, the FILENAME statement associates the fileref 'indata4' with the environment variable 'MYRAWDAT'. In the default C shell, the environment variable 'MYRAWDAT' is created by typing:

     setenv MYRAWDAT ~/sas/class/gpa.rawdata

The equivalent syntax for the Korn shell is:

     export MYRAWDAT=~/sas/class/gpa.rawdata

The environment variable can also be used as a reference name for an external file in the DATA step of a SAS program. The following statements use the environment variable 'MYRAWDAT' as a fileref to access a file called 'gpa.rawdata' in the subdirectory '~/sas/class' in two ways:

a) To read data from the external file:
             INFILE MYRAWDAT;
b) To write data to the external file:
             FILE MYRAWDAT;

As a general rule environment variables which are used to reference external files cannot include lowercase letters and the variable value must be a pathname. Environment variables with names that exceed eight (8) characters can only be used on the FILENAME statement. We recommend that you assign variable names that do not exceed eight (8) characters in length.

Creating a SAS Data File from Raw Data

SAS data files (system files) are referenced with a one- or two-level name. The two-level name is of the form

     libref.member-name

where libref refers to the SAS data library (directory) in which the data file resides and member-name refers to the particular member within that library. The one-level name is of the form

     member-name (without a libref)

In this case, SAS stores the files in the temporary WORK library which is defined automatically by the SAS system at the beginning of each SAS session or job.

Once defined, you can use librefs and filerefs to access data libraries and external files. As a caution, it is important that you issue the LIBNAME and FILENAME statements before the SAS statements that reference the file(s).

The LIBNAME and FILENAME statements used in Figure 4 show you three alternative methods for creating a SAS data file from a space delimited file.

In example A, part 1 uses the SAS LIBNAME statement to assign the libref ('outgpa') to an AIX SAS data library, in this case your home directory. Remember that a SAS library in AIX is a directory which is used to store data members.

Part 2 of example A uses the SAS FILENAME statement to assign a fileref ('ingpa') to the space delimited file 'gpa.rawdata'. The full path name is given (directory and filename).

Part 3 of example A creates a permanent SAS data file called '~/gpa.ssd01' from the external file '~/local2/samples/sas/aix/gpa.rawdata'. The library reference name (libref) 'outgpa' is used as the first level of the two-level SAS file name 'outgpa.gpa'.

Another approach (shown in example B) allows you to use a fileref to point to the directory using a FILENAME statement. Then in the INFILE statement you can specify the fileref followed by the individual filename in parentheses. The relevant syntax for the FILENAME statement is:

     FILENAME ingpa '/local2/samples/sas/aix';

and the matching syntax for the INFILE statement is (see example B in Figure 4):

     INFILE ingpa('gpa.rawdata');

This is especially useful when you have to refer to several files in one directory.

Alternatively, you can refer to the raw data file directly, by specifying the pathname for the file on the INFILE statement, as shown in the sample setup in example C.

     INFILE '/local2/samples/sas/aix/gpa.rawdata';

No FILENAME statement is required with this method.

Accessing a SAS Data Library on Disk

Once defined, you can use a libref to access a permanent SAS data library. The SAS statements in Figure 5 show the use of a libref in the PRINT procedure of SAS. Use of the libref to make updates to a SAS data library is also demonstrated.

Part A uses the SAS LIBNAME statement to assign the libref ('ingpa') to an AIX SAS library.

Part B uses the libref ('ingpa') as the value of a PROC statement option to access a member ('gpa') of a SAS data library ('~/local2/samples/sas/aix'). In the PRINT procedure of SAS, the data library is opened for read access only.

Part C creates a permanent SAS dataset '~/gpa.ssd01' from the external raw data file '/local2/samples/sas/aix/gpa.rawdata'. The SAS dataset is created by using the library reference name (libref) 'outgpa' as the first level of a two-level SAS file name 'outgpa.gpa'.

More Information


Figure 1

Hypothetical directory structure on AIX


Figure 2

Samples of SAS LIBNAME statements on AIX


Figure 3

Sample of SAS FILENAME statements onAIX


Figure 4

Three alternative methods for creating a SAS data file from raw data

Notes:
i.   Change '/local2/samples/sas/ais' to the directory containing your raw data. Change 'gpa.rawdata' to the name of the file containing your data.
ii.  Specify a variable list that corresponds to your data.

Figure 5

Accessing permanent SAS data libraries


Table 1

Engine and file types for libraries accessible to SAS on AIX


Table 2

Device types and functions in the SAS FILENAME statement


Peter Saama, Ph.D., is an OAC Statistical Consultant who has experience with resampling-based methods and assists users in the application of statistical methods to research problems across various disciplines.

*OAC/CS

2 Nov 95; Rev. 15 Dec 95