Help the Stat Consulting Group by giving a gift

Using SPSS functions for making and recoding variables

SPSS has a wide variety of functions you can use for creating and recoding variables. We will explore three kinds of functions: mathematical functions, string functions, and random number functions. These functions have the same general syntax:

function_name(argument1, argument2, etc.)

We will illustrate some functions using the following data file that
includes **name**, **x**, **test1**, **test2**,
and **test3**.

DATA LIST FREE / name (A14) x test1 test2 test3. BEGIN DATA. "John Smith" 4.2 86.5 84.55 81 "Samuel Adams" 9.0 -99 82.37 -99 "Ben Johnson" -6.2 82.1 84.81 87 "Chris Adraktas" 9.5 94.2 -99 93 "John Brown" -999 79.7 79.07 72 END DATA. LIST.

The output of the **LIST** command is shown
below.

NAME X TEST1 TEST2 TEST3 John Smith 4.20 86.50 84.55 81.00 Samuel Adams 9.00 -99.00 82.37 -99.00 Ben Johnson -6.20 82.10 84.81 87.00 Chris Adraktas 9.50 94.20 -99.00 93.00 John Brown -999.00 79.70 79.07 72.00

The variable **x** uses -999 to indicate
missing values, and **test1**, **test2** and **test3**
use -99 to indicate missing values. Below we tell SPSS about these missing values
and list out the data again.

MISSING VALUES x (-999) /test1 test2 test3 (-99). LIST.

The output is shown below. Note that the data really does not look any different after we have defined the missing values. But, as we will see below, SPSS does know to treat these values as missing rather than treating them as though they were -99 and -999.

NAME X TEST1 TEST2 TEST3 John Smith 4.20 86.50 84.55 81.00 Samuel Adams 9.00 -99.00 82.37 -99.00 Ben Johnson -6.20 82.10 84.81 87.00 Chris Adraktas 9.50 94.20 -99.00 93.00 John Brown -999.00 79.70 79.07 72.00

Now let's try some basic math functions. The **trunc**
function (short for truncate) takes a number and converts it to a whole number (integer)
by removing all the decimal places, for example, 6.99 and 6.49 would become 6. By
contrast, the **rnd** function (short for round) rounds numbers to the
nearest whole number using conventional rounding rules, for example 6.99 would become 7,
but 6.49 would become 6.

COMPUTE t1tr = TRUNC(test1). COMPUTE t2tr = TRUNC(test2). COMPUTE t1rnd = RND(test1). COMPUTE t2rnd = RND(test2). LIST name test1 t1tr t1rnd test2 t2tr t2rnd.

The results below are as we would expect.

NAME TEST1 T1TR T1RND TEST2 T2TR T2RND John Smith 86.50 86.00 87.00 84.55 84.00 85.00 Samuel Adams -99.00 . . 82.37 82.00 82.00 Ben Johnson 82.10 82.00 82.00 84.81 84.00 85.00 Chris Adraktas 94.20 94.00 94.00 -99.00 . . John Brown 79.70 79.00 80.00 79.07 79.00 79.00

SPSS has other mathematical functions. Below we
illustrate functions for getting the square root (**sqrt**), natural log (**ln**),
log to the base 10 (**lg10**) and exponential (**exp**).
Note that the **sqrt**, **ln** and **lg10**
functions do not work with negative numbers (for example you cannot take the square root
of a negative number). SPSS will generate missing values in such cases, as we will
see below.

The results are shown below. We expected SPSS to generate missing values forCOMPUTE xsqrt = SQRT(x). COMPUTE xln = LN(x). COMPUTE xlg10 = LG10(x). COMPUTE xexp = EXP(x). EXECUTE. LIST x xsqrt xln xlg10 xexp.

X XSQRT XLN XLG10 XEXP 4.20 2.05 1.44 .62 66.69 9.00 3.00 2.20 .95 8103.08 -6.20 . . . .00 9.50 3.08 2.25 .98 13359.73 -999.00 . . . .

The results also included warnings like the one shown below. The one below is telling us that you cannot take the square root of a negative number and that SPSS is going to set the result to the system missing value.

Warning # 603 The argument to the square root function is less than zero. The result has been set to the system-missing value.

SPSS also has statistical functions that operate on one
or more variables. For example, we might want to compute the average of the three
test scores. SPSS has the **MEAN** function that can do that for you,
as shown below.

COMPUTE avg = MEAN(test1, test2, test3). LIST name test1 test2 test3 avg.

We see the results below. Note that SPSS computed
the mean of the non missing values. For Samuel Adams, that meant that his average
was the same as his score on **test2** since that was the only non-missing
value. We could tell SPSS to give anyone a missing value if they have fewer than 2
valid test scores using the **mean.2** function. Likewise, we could
tell SPSS that we want the mean to be missing if any of the scores were missing, by using
the **mean.3** function. These are illustrated below.

COMPUTE avg2 = MEAN.2(test1, test2, test3). COMPUTE avg3 = MEAN.3(test1, test2, test3). LIST name test1 test2 test3 avg avg2 avg3.

As you see below, **avg2** is missing for
Samuel Adams, and **avg3** is also missing for Samuel Adams and Chris
Adraktas because they both had some missing test scores.

NAME TEST1 TEST2 TEST3 AVG AVG2 AVG3 John Smith 86.50 84.55 81.00 84.02 84.02 84.02 Samuel Adams -99.00 82.37 -99.00 82.37 . . Ben Johnson 82.10 84.81 87.00 84.64 84.64 84.64 Chris Adraktas 94.20 -99.00 93.00 93.60 93.60 . John Brown 79.70 79.07 72.00 76.92 76.92 76.92

In addition to the **mean** function, SPSS
also has **sum**, **sd**, **variance**, **min** and **max** functions.

Now let's illustrate some of the SPSS string functions.
Below we create **up** that will be the name converted into upper
case, **lo** that will be the name converted to lower case, and **sub** that will
be the third through eighth character in the persons name. Note that we first had to use
the **string** command to tell SPSS that **up** **lo**
and **sub** are string variables that will have a length of up to 14
characters. Had we omitted the **string** command, these would have
been treated as numeric variables, and when SPSS tried to assign a character value to the
numeric variables, it would have generated an error. We also create **len**
that is the length of the name variable, and **len2** that is the length of
the persons name.

STRING up lo (A14) /sub (A6). COMPUTE up = UPCASE(name). COMPUTE lo = LOWER(name). COMPUTE sub = SUBSTR(name,3,8). COMPUTE len = LENGTH(name). COMPUTE len2 = LENGTH(RTRIM(name)). LIST name up lo sub len len2.

The results are shown below. The results for **up** **lo** **sub** all as we would expect. The result for
**len**
may be a bit confusing. The variable **len** does not refer to the
length of the person's name, but it refers to the length of the variable **name**.
When we read the data we entered **name (A14) **for name, giving the variable a length of 14, and that is why
**len**
is always 14. By contrast, **len2** uses the **rtrim**
function to strip off any excess blanks, and then it takes the length of that. In
the end, **len2** returns the length of the persons name, for example John
Smith has a length of 10.

NAME UP LO SUB LEN LEN2 John Smith JOHN SMITH john smith hn Smi 14.00 10.00 Samuel Adams SAMUEL ADAMS samuel adams muel A 14.00 12.00 Ben Johnson BEN JOHNSON ben johnson n John 14.00 11.00 Chris Adraktas CHRIS ADRAKTAS chris adraktas ris Ad 14.00 14.00 John Brown JOHN BROWN john brown hn Bro 14.00 10.00

Let's use SPSS string functions to get the first name
and last name out of the **name** variable. We start by using the **
index**
function to determine the position of the first blank space in the name. We then use
the **substr** function to extract the part of the name before the blank to
be the first name, and the part after the blank to be the last name.

STRING fname lname (A10). COMPUTE blank = INDEX(name,' '). COMPUTE fname = SUBSTR(name,1,blank-1). COMPUTE lname = SUBSTR(name,blank+1). LIST name blank fname lname.

The results below show that this was successful.
For example, for John Smith, the **substr** function extracted the first name
by taking the substring from the 1st to 4th character of **name**, and the
last name by taking the 6th character and onward.

NAME BLANK FNAME LNAME John Smith 5.00 John Smith Samuel Adams 7.00 Samuel Adams Ben Johnson 4.00 Ben Johnson Chris Adraktas 6.00 Chris Adraktas John Brown 5.00 John Brown

Random numbers are more useful than you might imagine, they are used extensively in Monte Carlo studies, but they are also frequently used in many other situation We will look at two of SPSS's random number functions

uniform(n)- generates a random number that is 0 or greater, and less thannfrom a uniform distribution.rv.binomial(n,p)- generates a value from the binomial distribution withntrials, and with a probability of success equal top.

Below we generate a random number that is greater than or equal to 0, but less than 1.

COMPUTE rannum = UNIFORM(1). LIST name rannum.

We see the results below.

NAME RANNUM John Smith .14 Samuel Adams .43 Ben Johnson .61 Chris Adraktas .29 John Brown .16

Below we generate a random number that is greater than or equal to 0, but less than 10.

COMPUTE ran10 = UNIFORM(10). LIST NAME ran10.

And the results are shown below.

NAME RAN10 John Smith 7.00 Samuel Adams 3.46 Ben Johnson 4.46 Chris Adraktas .52 John Brown 1.03

The example below generates a whole number (integer)
from 1 to 100. The **trucn** function is used to convert the result
into a whole number from 0 to 99, and then 1 is added to make it from 1 to 100.

COMPUTE ran100 = TRUNC(UNIFORM(100)) + 1. LIST name ran100.

As we see below, these values are all whole numbers.

NAME RAN100 John Smith 15.00 Samuel Adams 5.00 Ben Johnson 63.00 Chris Adraktas 16.00 John Brown 72.00

Below we use the **rv.binomial** function
to simulate a coin flip. It is like a coin flip since the number of trials is 1 and
the probability of success is .5 (like flipping a coin once and the probability of it
coming up heads is .5). Let's treat a 1 as coming up heads, and a 0 as coming up tails. As we see below, Ben and John each got a head, and the others got tails.

COMPUTE flip = RV.BINOMIAL(1 , .5 ). LIST name flip.NAME FLIP John Smith .00 Samuel Adams .00 Ben Johnson 1.00 Chris Adraktas .00 John Brown 1.00

Below, we change the number of flips to 10, and count the number of heads each person gets. John got the most heads (7) and Ben got the fewest (4).

COMPUTE flip10 = RV.BINOMIAL(10 , .5 ). LIST name flip10.NAME FLIP10 John Smith 6.00 Samuel Adams 6.00 Ben Johnson 4.00 Chris Adraktas 5.00 John Brown 7.00

The next example changes the flips to 100. It also
sets the **seed** for the random number generator. The **seed**
determines the string of random numbers that will be generated. John got the fewest
heads (49 out of 100) and Samuel got the most (58 out of 100).

SET SEED = 149238. COMPUTE flip100 = RV.BINOMIAL(100 , .5 ). LIST name flip100 .NAME FLIP100 John Smith 49.00 Samuel Adams 58.00 Ben Johnson 52.00 Chris Adraktas 53.00 John Brown 52.00

If we repeat the example from above using the exact same
**seed**, we will get the same results. This is very useful for being
able to replicate results of a simulation study or Monte Carlo style study. Indeed,
using the same **seed** did generate the same results (see below).

SET SEED = 149238. COMPUTE flip100 = RV.BINOMIAL(100 , .5 ). LIST name flip100 .NAME FLIP100 John Smith 49.00 Samuel Adams 58.00 Ben Johnson 52.00 Chris Adraktas 53.00 John Brown 52.00

In the examples above, we used the **rv.binomial**
function to simulate coin flips but it gave us the end result of all of the flips.
Perhaps you would like to do a simulation study where you generate each of the flips as a
separate observation. SPSS can do this, as we illustrate below.

SET seed=943785. INPUT PROGRAM. + LOOP id = 1 to 25. + COMPUTE cointoss = RV.BINOMIAL( 1 , .5 ). + END CASE. + END LOOP. + END FILE. END INPUT PROGRAM. LIST CASES.

The program above creates 25 observations, each having a
variable called **id** which is the trial number, and **cointoss**
that will be either 1 or 0. Even if this program does not make much sense to you,
you could use it as a template to make your own simulation. You can change the
number of trials by changing 25 to the number of trials you want. You can change the
probability of success by changing the value of .5 to the value you would like. Or,
you could choose an entirely different random number generating function instead of
**rv.binomial**
you might choose **uniform**. The results of the program above are
shown below.

ID COINTOSS 1.00 .00 2.00 1.00 3.00 1.00 4.00 .00 5.00 .00 6.00 1.00 7.00 .00 8.00 .00 9.00 .00 10.00 .00 11.00 .00 12.00 1.00 13.00 .00 14.00 .00 15.00 .00 16.00 1.00 17.00 .00 18.00 1.00 19.00 1.00 20.00 1.00 21.00 1.00 22.00 1.00 23.00 .00 24.00 1.00 25.00 .00

Watch out for math errors, such as division by zero, square root of a negative number and log of a negative number.

For information on Functions is SPSS consult the SPSS Command Syntax Reference Guide.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.