### SAS Learning Module Using SAS functions for making and recoding variables

#### 1. Introduction

A SAS function returns a value from a computation or system manipulation that requires zero or more arguments. Most functions use arguments supplied by the user; however, a few obtain their arguments from the operating system. Here is the syntax of a function:

function-name(argument1, argument2)

We will illustrate some functions using the following dataset that includes name, x, test1, test2, and test3.

DATA getdata;
INPUT name \$14. x test1 test2 test3;
DATALINES;
John Smith       4.2 86.5 84.55 81
Samuel Adams     9.0 70.3 82.37 .
Ben Johnson     -6.2 82.1 84.81 87
Chris Adraktas   9.5 94.2 92.64 93
John Brown        .  79.7 79.07 72
;
RUN;    

The data set funct1 will create new variables using the int, round and mean numeric functions. What happens to tave due to the missing value of test3?

DATA funct1;
SET getdata;
t1int = INT(test1);  t2int = INT(test2);      /* integer part of a number */
t1rnd = ROUND(test1);t2rnd = ROUND(test2,.1); /* round to nearest whole number */
tave = MEAN(test1, test2, test3);             /* mean across variables */
RUN;

PROC PRINT DATA=funct1;
VAR test1 test2 test3 t1int t2int t1rnd t2rnd tave;
RUN;

OBS    TEST1    TEST2    TEST3    T1INT    T2INT    T1RND    T2RND      TAVE
1      86.5    84.55      81       86       84       87      84.6    84.0167
2      70.3    82.37       .       70       82       70      82.4    76.3350
3      82.1    84.81      87       82       84       82      84.8    84.6367
4      94.2    92.64      93       94       92       94      92.6    93.2800
5      79.7    79.07      72       79       79       80      79.1    76.9233

Now let's try some more math functions. What happens when there is a missing or negative value of x?

DATA funct2;
SET getdata;
xsqrt = SQRT(x);     /* square root */
xlog = LOG(x);      /* log base e */
xexp = EXP(x);      /* e raised to the power */
RUN;

PROC PRINT DATA=funct2;
VAR x xsqrt xlog xexp;
RUN;

OBS      X      XSQRT       XLOG         XEXP
1      4.2    2.04939    1.43508       66.69
2      9.0    3.00000    2.19722     8103.08
3     -6.2     .          .             0.00
4      9.5    3.08221    2.25129    13359.73
5       .      .          .              .

This time we'll try some string functions. In particular, look closely at the substr function that is used in fname and lname.

DATA funct3;
SET getdata;
c1  = UPCASE(name);     /* convert to upper case */
c2  = SUBSTR(name,3,8); /* substring */
len = LENGTH(name);     /* length of string */
ind = INDEX(name,' ');  /* position in string */
fname = SUBSTR(name,1,INDEX(name,' '));
lname = SUBSTR(name,INDEX(name,' '));
RUN;

PROC PRINT DATA=funct3;
VAR name c1 c2 len ind fname lname;
RUN;

OBS        NAME              C1            C2      LEN   IND   FNAME    LNAME
1    John Smith       JOHN SMITH       hn Smith    10    5    John     Smith
3    Ben Johnson      BEN JOHNSON      n Johnso    11    4    Ben      Johnson
5    John Brown       JOHN BROWN       hn Brown    10    5    John     Brown

#### 2. Random numbers in SAS

Random numbers are more useful than you might imagine.  They are used extensively in Monte Carlo studies, as well as in many other situations.  We will look at two of SAS's random number functions.

• UNIFORM(SEED) - generates values from a random uniform distribution between 0 and 1
• NORMAL(SEED) - generates values from a random normal distribution with mean 0 and standard deviation 1

The statements if x>.5 then coin = 'heads' and else coin = 'tails' create a random variable called coins that has values 'heads' and 'tails'.  The data sets random1 and random2 use a seed value of -1.  Negative seed values will result in different random numbers being generated each time.

DATA random1;
x = UNIFORM(-1);
y = 50 + 3*NORMAL(-1);
IF x>.5 THEN coin = 'heads';
ELSE coin = 'tails';
RUN;

DATA random2;
x = UNIFORM(-1);
y = 50 + 3*NORMAL(-1);
IF x>.5 THEN coin = 'heads';
ELSE coin = 'tails';
RUN;

PROC PRINT DATA=random1;
VAR x y coin;
RUN;
PROC PRINT DATA=random2;
VAR x y coin;
RUN;

OBS       X          Y       COIN

OBS       X          Y       COIN
1     0.16922    49.1155    tails

Sometimes we will want to generate the same random numbers each time so that we can debug our programs. To do this we just enter the same positive number as the seed value.  The data sets random3 and random4 illustrate how to generate the same results each time.

data random3;
x = UNIFORM(123456);
y = 50 + 3*NORMAL(123456);
IF x>.5 THEN coin = 'heads';
ELSE coin = 'tails';
RUN;

data random4;
x = UNIFORM(123456);
y = 50 + 3*NORMAL(123456);
IF x>.5 THEN coin = 'heads';
ELSE coin = 'tails';
RUN;

PROC PRINT DATA=random3;
VAR x y coin;
RUN;
PROC PRINT DATA=random4;
VAR x y coin;
RUN;

OBS       X          Y       COIN

OBS       X          Y       COIN
1     0.73902    48.7832    heads

Now let's generate 100 random coin tosses and compute a frequency table of the results.

DATA random5;
DO i=1 to 100;
x = UNIFORM(123456);
IF x>.5 THEN coin = 'heads';
ELSE coin = 'tails';
OUTPUT;
END;
RUN;

PROC FREQ DATA=random5;
table coin;
RUN;

Cumulative  Cumulative
COIN    Frequency   Percent   Frequency    Percent
---------------------------------------------------
tails         52      52.0         100      100.0

#### 3. Problems to look out for

Watch out for math errors, such as division by zero, square root of a negative number and taking the log of a negative number.

For information on functions is SAS consult the SAS Language manual.

#### 5. Web notes

You can view the SAS program associated with this module by clicking funct.sas . While viewing the file, you can save it by choosing File then Save As from the pull-down menu of your web browser. In the Save As dialog box, change the file name to funct.sas and then choose the directory where you want to save the file, then click Save.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.