Help the Stat Consulting Group by giving a gift

Creating and recoding variables

This module shows how to create and recode variables. In
SPSS you can create new variables with **compute** and you can modify the values of an existing variable with
**recode**.

Let's use the auto data for our examples. In this section we will see how to
create new variables with **compute**.

get file 'c:\auto.sav'.

The variable **length** contains the length of the car in inches. Below we see summary statistics for
**length**.

descriptives variables = length.

Descriptive StatisticsN Minimum Maximum Mean Std. Deviation Length (in.) 74 142 233 187.93 22.266 Valid N (listwise) 74

Let's use the **compute** command to make a new variable that has the length in feet instead of inches, called
**lenft**.

compute lenft = length / 12. execute. descriptive variables=length lenft.

Descriptive StatisticsN Minimum Maximum Mean Std. Deviation Length (in.) 74 142 233 187.93 22.266 LENFT 74 11.83 19.42 15.6610 1.85553 Valid N (listwise) 74

Suppose we wanted to make a variable called **length2** which has
**length** squared.

compute length2 = length**2. execute. descriptive variables = length2.

Descriptive StatisticsN Minimum Maximum Mean Std. Deviation LENGTH2 74 20164.00 54289.00 35807.6892 8364.04524 Valid N (listwise) 74

Or we might want to make **loglen** which is the natural log of
**length**.
Note that you can shorten the command **descriptive** to just **desc**,
and you can shorten **variables** to **var**.

compute loglen = ln(length). execute. desc var = loglen.

Descriptive StatisticsN Minimum Maximum Mean Std. Deviation LOGLEN 74 4.96 5.45 5.2290 .12014 Valid N (listwise) 74

Let's get the mean and standard deviation of **length** and we can make Z-scores of
**length**.
In SPSS there are two ways to get the z-scores, and we will show you both
ways. The first way is to use the **save** subcommand after the **descriptive**
command. This will save the z-scores into the data file. The other
way to obtain z-scores is to make them manually, and the code necessary to do
that is shown below. When making z-scores manually, you do not need to
use the **save** subcommand with the **descriptive** command.

desc variables = length /save.

Descriptive StatisticsN Minimum Maximum Mean Std. Deviation Length (in.) 74 142 233 187.93 22.266 Valid N (listwise) 74

The mean is 187.93 and the
standard deviation is 22.27, so **zlength** can be computed as shown below.

compute zlen = (length - 187.93) / 22.27. execute. desc variables = zlen.

Descriptive StatisticsN Minimum Maximum Mean Std. Deviation ZLEN 74 -2.06 2.02 .0001 .99984 Valid N (listwise) 74

With **compute**

you can use + - for addition and subtraction

you can use * / for multiplication and division

you can use
** for exponents (e.g., length**2)

you can use ( ) for
controlling order of operations.

Suppose that we wanted to break **mpg** down into
three categories. Let's look at a table of **mpg** to see where we might draw the lines for such categories.

frequencies variables = mpg.

Statistics

Mileage (mpg)N Valid 74 Missing 0

Mileage (mpg)Frequency Percent Valid Percent Cumulative Percent Valid 12 2 2.7 2.7 2.7 14 6 8.1 8.1 10.8 15 2 2.7 2.7 13.5 16 4 5.4 5.4 18.9 17 4 5.4 5.4 24.3 18 9 12.2 12.2 36.5 19 8 10.8 10.8 47.3 20 3 4.1 4.1 51.4 21 5 6.8 6.8 58.1 22 5 6.8 6.8 64.9 23 3 4.1 4.1 68.9 24 4 5.4 5.4 74.3 25 5 6.8 6.8 81.1 26 3 4.1 4.1 85.1 28 3 4.1 4.1 89.2 29 1 1.4 1.4 90.5 30 2 2.7 2.7 93.2 31 1 1.4 1.4 94.6 34 1 1.4 1.4 95.9 35 2 2.7 2.7 98.6 41 1 1.4 1.4 100.0 Total 74 100.0 100.0

Let's convert **mpg** into
three categories to help make this more readable. Here we convert
**mpg** into three categories using **compute** and **if**.

compute mpg3 = 1. if (mpg >= 19) & (mpg <= 23) mpg3 = 2. if (mpg >= 24) & (mpg <= 100) mpg3 = 3. execute.

Now, we could use **mpg3** to show a crosstab of **mpg3** by
**foreign** to contrast the
mileage of the foreign and domestic cars.

crosstabs /tables = mpg by mpg3.

Case Processing SummaryCases Valid Missing Total N Percent N Percent N Percent Mileage (mpg) * MPG3 74 100.0% 0 .0% 74 100.0%

Mileage (mpg) * MPG3 Crosstabulation

CountMPG3 Total 1.00 2.00 3.00 Mileage (mpg) 12 2 2 14 6 6 15 2 2 16 4 4 17 4 4 18 9 9 19 8 8 20 3 3 21 5 5 22 5 5 23 3 3 24 4 4 25 5 5 26 3 3 28 3 3 29 1 1 30 2 2 31 1 1 34 1 1 35 2 2 41 1 1 Total 27 24 23 74

crosstabs /tables = mpg3 by foreign /cells = count column.

Case Processing Summary

Cases Valid Missing Total N Percent N Percent N Percent MPG3 * Car type 74 100.0% 0 .0% 74 100.0%

MPG3 * Car type Crosstabulation

Car type Total Domestic Foreign MPG3 1.00 Count 22 5 27 % within Car type 42.3% 22.7% 36.5% 2.00 Count 19 5 24 % within Car type 36.5% 22.7% 32.4% 3.00 Count 11 12 23 % within Car type 21.2% 54.5% 31.1% Total Count 52 22 74 % within Car type 100.0% 100.0% 100.0%

The crosstab above shows that 21% of the domestic cars fall into the **high
**category, while 55% of the foreign cars fit into this category.

There is an easier way to recode **mpg** to
three categories using **recode**. Using this method, we do not need to make a copy of **mpg** or use the **compute**
command. We simply use the **recode** command with the **into**
option with the name of the new variable into which we want to recode **mpg**.
In this case, we will recode **mpg** into **mpg3a** using
three categories:
lo-18 into 1, 12-23 into 2, and 24-hi into 3. Note the **lo** and **hi**
are SPSS keywords that can be used when we do not know the lowest or
the highest values of the variable.

recode mpg (lo thru 18=1) (19 thru 23=2) (24 thru hi=3) into mpg3a. execute.

Let's double check to see that this worked correctly. We see that it worked perfectly.

crosstabs /tables = mpg by mpg3a.

Case Processing SummaryCases Valid Missing Total N Percent N Percent N Percent Mileage (mpg) * MPG3A 74 100.0% 0 .0% 74 100.0%

Mileage (mpg) * MPG3A Crosstabulation

CountMPG3A Total 1.00 2.00 3.00 Mileage (mpg) 12 2 2 14 6 6 15 2 2 16 4 4 17 4 4 18 9 9 19 8 8 20 3 3 21 5 5 22 5 5 23 3 3 24 4 4 25 5 5 26 3 3 28 3 3 29 1 1 30 2 2 31 1 1 34 1 1 35 2 2 41 1 1 Total 27 24 23 74

Let's create a variable called **mpgfd** that assesses the mileage of the
cars with respect to their origin. This variable,
**mpgfd**, will have two values:

0 if below the median mpg for its group (foreign/domestic)

1 if at/above the median mpg for its group (foreign/domestic).

sort cases by foreign. examine variables = mpg by foreign /plot none /compare group / percentiles (5,10,25,50,75,95) haverage.

Case Processing SummaryCases Valid Missing Total N Percent N Percent N Percent Mileage (mpg) 74 100.0% 0 .0% 74 100.0%

DescriptivesStatistic Std. Error Mileage (mpg) Mean 21.30 .673 95% Confidence Interval for Mean Lower Bound 19.96 Upper Bound 22.64 5% Trimmed Mean 20.92 Median 20.00 Variance 33.472 Std. Deviation 5.786 Minimum 12 Maximum 41 Range 29 Interquartile Range 7.25 Skewness .968 .279 Kurtosis 1.130 .552

PercentilesPercentiles 5 10 25 50 75 95 Weighted Average(Definition 1) Mileage (mpg) 14.00 14.00 17.75 20.00 25.00 34.25 Tukey's Hinges Mileage (mpg) 18.00 20.00 25.00

Case Processing SummaryCases Valid Missing Total Car type N Percent N Percent N Percent Mileage (mpg) Domestic 52 100.0% 0 .0% 52 100.0% Foreign 22 100.0% 0 .0% 22 100.0%

DescriptivesCar type Statistic Std. Error Mileage (mpg) Domestic Mean 19.83 .658 95% Confidence Interval for Mean Lower Bound 18.51 Upper Bound 21.15 5% Trimmed Mean 19.60 Median 19.00 Variance 22.499 Std. Deviation 4.743 Minimum 12 Maximum 34 Range 22 Interquartile Range 5.75 Skewness .794 .330 Kurtosis .612 .650 Foreign Mean 24.77 1.410 95% Confidence Interval for Mean Lower Bound 21.84 Upper Bound 27.70 5% Trimmed Mean 24.48 Median 24.50 Variance 43.708 Std. Deviation 6.611 Minimum 14 Maximum 41 Range 27 Interquartile Range 8.25 Skewness .706 .491 Kurtosis .468 .953

PercentilesPercentiles Car type 5 10 25 50 75 95 Weighted Average(Definition 1) Mileage (mpg) Domestic 13.30 14.00 16.25 19.00 22.00 29.35 Foreign 14.45 17.00 20.25 24.50 28.50 40.10 Tukey's Hinges Mileage (mpg) Domestic 16.50 19.00 22.00 Foreign 21.00 24.50 28.00

We see that the median is 19.00 for the domestic (foreign=0) cars and 24.50 for the foreign (foreign=1) cars. The
**compute** and **recode** commands below recode **mpg** into
**mpgfd** based on the median for the domestic cars and the median for the foreign cars.
In this example, we show how to create a new variable with all missing values,
which can then be recoded. In SPSS, to create a new variable with all
missing values, you use the **compute** command and set the new variable
equal to **$sysmis**. The SPSS system variable **$sysmis** creates
system missing values. We also use the **do if** command,
which is useful when you want to recode a variable based on different values of
another variable. Remember that you will need to use an **end if**
command at the end of your do-loop.

compute mpgfd = $sysmis. do if foreign = 0. recode mpg (lo thru 18=0) (19 thru hi=1) into mpgfd. end if. do if foreign = 1. recode mpg (lo thru 24=0) (25 thru hi=1) into mpgfd. end if. execute.

We can check the new variables using the command below. The recoded
variable **mpgfd** looks correct.

crosstabs /tables = mpg by mpgfd.

Case Processing SummaryCases Valid Missing Total N Percent N Percent N Percent Mileage (mpg) * MPGFD 74 100.0% 0 .0% 74 100.0%

Mileage (mpg) * MPGFD Crosstabulation

CountMPGFD Total .00 1.00 Mileage (mpg) 12 2 2 14 6 6 15 2 2 16 4 4 17 4 4 18 9 9 19 8 8 20 3 3 21 2 3 5 22 5 5 23 3 3 24 1 3 4 25 5 5 26 3 3 28 3 3 29 1 1 30 2 2 31 1 1 34 1 1 35 2 2 41 1 1 Total 33 41 74

Create a new variable **len_ft** which is
**length** divided by 12.

compute len_ft = length / 12.

Recode** mpg** into **mpg3**, having
three categories, 1 2 3, using
**compute** and** if**.

compute mpg3 = 1. if (mpg >= 19) & (mpg <= 23) mpg3 = 2. if (mpg >= 24) & (mpg <= 100) mpg3 = 3. execute.

Recode **mpg** into **mpg3a**,
having three categories using **recode**.

recodempg (lo thru 18=1) (19 thru 23=2) (24 thru hi=3) into mpg3a. execute.

Recode** mpg** into **mpgfd**, having
two categories, but using different cutoffs for foreign and domestic cars.

compute mpgfd = $sysmis. do if foreign = 0 . recode mpg (lo thru 18=0) (19 thru hi=1) into mpgfd. end if.do if foreign = 1. recode mpg (lo thru 24=0) (25 thru hi=1) into mpgfd. end if. execute.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.