### Statistical Computing Seminars Beyond Point and Click: SPSS Syntax

Here are links for the online movies presenting the material in this seminar.

#### Introduction

The aim of this seminar is to help you learn about the use of SPSS syntax as an alternative to the point-and-click interface.  In many instances, you may find that using syntax is simpler and more convenient than using point-and-click.  The use of syntax is also helpful in documenting your analysis.  It is difficult to take adequate notes on modifications made to the data and the procedures used to do the analyses when using point-and-click.  However, documenting what you are doing in a syntax file is simple and this makes reviewing and/or reconstructing the analysis much easier.

All SPSS procedures and functions are executed using syntax code, whether you use the point-and-click interface or write your own syntax.  Almost everything that you can do in SPSS via point-and-click can be accomplished by writing syntax.  (There are a few exceptions, most notably when using the igraph command.)  Also, there are a handful of commands that are available via syntax that are not available via the point-and-click interface, such as temporary and manova.  There are several ways in which you can get SPSS to show you the syntax that it is using to run your analyses, and they are explained below.

Perhaps the simplest way to ease yourself into writing SPSS syntax is to click on the Paste button instead of the OK button after you have set up your analysis.  This will paste the code that SPSS uses to run your analysis into a syntax window.  A syntax file is nothing more than a text file; hence, you can type code and comments into it, and you can cut-and-paste in it as you would in any text editor.  To run the code that you have pasted, you simply highlight it and click on the right-pointing arrow at the top.  Your results will be displayed in the output window just the same as if you had used the point-and-click interface.

Another simple thing that you can do to help you learn SPSS syntax is to make a change in the general SPSS options that will show the code being used immediately before the output in the output window.  This code can be copied and pasted into the SPSS syntax editor to be saved and/or modified.  To make this change, from the Data Editor window, click on Edit, then on Options, and then on the View tab.  In the lower left-hand corner, check the option that says "Display commands in log", and you will see all of the commands issued from then on in your output window immediately above the corresponding output.

Finally, you can change the general SPSS options to save the SPSS journal to a convenient location.  The journal is a log of all of the SPSS commands that have been issued, but with no output.  To make this change, from the Data Editor window, click on Edit, then Options, and under the General tab, you will see where SPSS is saving the journal file.  You can change that location, and you can indicate whether the journal should be overwritten every time you start SPSS, or if your next session should be appended to the bottom of the existing file.  You can view the journal file using a text editor such as WordPad.  Be aware that the file might be quite long, so NotePad may not be able to open the file.

Now that we have seen some easy ways to learn SPSS syntax, let's look at some situations in which using SPSS syntax may be easier than using the point-and-click interface.  First, let's open a syntax window.  To do this, from the Data Editor window, click on "File", then "New", and then "Syntax."  If you want to open an existing syntax file, you would click on "File", then "Open", and then on "Syntax".

One of the most important things to remember when writing SPSS syntax is that all commands must end in a period (.).  This includes comments, which you can use pretty much anywhere in your syntax file.  To start a comment, use either an asterisk (*) or the command comment.  If you forget to end your comment with a period, SPSS will consider everything between comment or * and the next period to be part of the comment, and you may be left wondering why some of your commands did not run.

#### 1.  Creating numeric variables

Perhaps the first thing that you need to know when using SPSS syntax is how to open a data file.  The SPSS command for this is get file followed by the path where the file is located.  The path and file name must be enclosed in quotes, and you need to include the file extension, which is .sav for SPSS data files.

get file 'd:\data.sav'.

Now that we have our data file read into SPSS, let's create some new variables.  Two commands that you can use to create numeric variables are compute and if.  Be aware that there is no "then" in SPSS.  SPSS will not create the new variables unless we issue either the execute command or a procedural command (whether or not the procedure involves the newly created variable).  In the syntax below, the execute is technically unnecessary because we issue the procedural command list immediately afterward.  However, including the execute does not cause any problems, and it is handy to have in case you later change the program and remove the command that executes the compute commands.

compute newvar1 = num1.
compute newvar = 0.
if num1 = 20 newvar = 1.
if num1 ge 50 or num2 le 15 newvar = 2.
if num1=96 and num2 = 96 newvar = 3.
if num1 ge 90 newvar2 = 1.
execute.
list num1 num2 newvar newvar1.
    NUM1     NUM2   NEWVAR  NEWVAR1  NEWVAR2

20.00    20.00     1.00    20.00      .
20.00    30.00     1.00    20.00      .
52.00    36.00     2.00    52.00      .
63.00    86.00     2.00    63.00      .
45.00    72.00      .00    45.00      .
93.00    12.00     2.00    93.00     1.00
28.00    15.00     2.00    28.00      .
75.00    46.00     2.00    75.00      .
96.00    96.00     3.00    96.00     1.00
34.00    36.00      .00    34.00      .
73.00    32.00     2.00    73.00      .
20.00    30.00     1.00    20.00      .
55.00    13.00     2.00    55.00      .
91.00    29.00     2.00    91.00     1.00
78.00    30.00     2.00    78.00      .

Number of cases read:  15    Number of cases listed:  15

You can use all sorts of math and functions when creating your variables.  As shown in the following code, the execute command can be shortened to exe.

compute newvar3 = num1*num2.
compute newvar4 = num1/6.56.
compute newvar5 = sum(num1,num2).
exe.
list newvar3 newvar4 newvar5.
 NEWVAR3  NEWVAR4  NEWVAR5

400.00     3.05    40.00
600.00     3.05    50.00
1872.00     7.93    88.00
5418.00     9.60   149.00
3240.00     6.86   117.00
1116.00    14.18   105.00
420.00     4.27    43.00
3450.00    11.43   121.00
9216.00    14.63   192.00
1224.00     5.18    70.00
2336.00    11.13   105.00
600.00     3.05    50.00
715.00     8.38    68.00
2639.00    13.87   120.00
2340.00    11.89   108.00

Number of cases read:  15    Number of cases listed:  15

#### 2.  Creating standardized variables

descriptives num1
/save.
desc num1 (num1z)
/save.

Note that a label was automatically created for the new variables.  We also see that the command descriptives is another command that can be shortened (to desc).

#### 3.  Creating string variables

There are two types of string variables in SPSS:  short strings and long strings.  Short string variables have a maximum length of eight characters.  Long string variables have a maximum length of 255 characters.  Long strings can be displayed by some procedures and the print command, and they can be used as "break" variables.  However, long string variables cannot be used in tabulation procedures and they cannot have user-defined missing values (see below).  This means that long string variables cannot have missing values, as user-defined missing is the only kind of missing values a string variable can have.  To create either type of string variable, you usually need to use the string command.  You can then populate the new string variable using the compute command.  This is unlike numeric variables, which can be both created and populated using the compute command.

string string1 (A4).
string string2 to string4 (A10).
compute string1 = "a".
if newvar2 = 1 string2 = "b".
if newvar1 ge 50 and newvar ne 1 string3 = "No".
exe.
list newvar newvar1 string1 to string4.
  NEWVAR  NEWVAR1 STRING1 STRING2    STRING3    STRING4

1.00    20.00 a
1.00    20.00 a
2.00    52.00 a                  No
2.00    63.00 a                  No
.00    45.00 a
2.00    93.00 a       b          No
2.00    28.00 a
2.00    75.00 a                  No
3.00    96.00 a       b          No
.00    34.00 a
2.00    73.00 a                  No
1.00    20.00 a
2.00    55.00 a                  No
2.00    91.00 a       b          No
2.00    78.00 a                  No

Number of cases read:  15    Number of cases listed:  15

SPSS is case-sensitive when recoding string variables.  Hence, if you use upper-case letters in your recode command and have lower-case letters in your variable, nothing will happen.  This includes NOT getting an error message in the output window telling you that no recoding was done.

recode str1 ("a", "b","c" = "D").
string str2a str2b str3a (A6).
recode str2 ("c" = "D") ("a" = ' ') into str2a.
recode str2 ("c" = "D") ("a" = ' ') (else=copy) into str2b.
recode str3 ("c" = "D") ("a" = ' ') (else = 'x') into str3a.
exe.
list str1 str2 str2a str2b str3 str3a.
<output shown in SPSS Output window>
recode str1 ("a", "b","c" = "D").
string str2a str2b str3a (A6).
recode str2 ("c" = "D") ("a" = ' ') into str2a.
recode str2 ("c" = "D") ("a" = ' ') (else=copy) into str2b.

>Warning # 4684 in column 54.  Text: STR2B
>On the RECODE command, the list of variables following the keyword INTO
>includes a string variable which is not of sufficient width to accept the
>longest string value generated by the value specifications.  Long values
>will be truncated to the length of the variables.

recode str3 ("c" = "D") ("a" = ' ') (else = 'x') into str3a.
list str1 str2 str2a str2b str3 str3a.
STR1     STR2     STR2A  STR2B  STR3     STR3A

D        d               d      d        x
D        c        D      D      x        x
D        a                      x        x
D        b               b      d        x
f        d               d      d        x
d        d               d      b        x
D        f               f      b        x
D        b               b      c        D
D        a                      d        x
D        x               x               x
D        x               x      a
D                               a
D                               d        x
f        a                      c        D
b               b      b        x

Number of cases read:  15    Number of cases listed:  15

#### 4.  Recoding variables

There are several ways that you can recode variables in SPSS. For example, you can use the recode command, the if command or the autorecode command.  Remember that when using the if command, there is no "then" in SPSS syntax.  You can create complex rules regarding how variables get recoded.  You have lots of functions from which to choose, and you can do all sorts of mathematical manipulations.

if num1 = 55 y = 30.
if num1 le 50 and gender = "f" y = 35.
list num1 gender y.
    NUM1 GENDER          Y

20.00 f           35.00
20.00 f           35.00
52.00 f             .
63.00 m             .
45.00 m             .
93.00 f             .
28.00 m             .
75.00 f             .
96.00 m             .
34.00 f           35.00
73.00 f             .
20.00 f           35.00
55.00             30.00
91.00 m             .
78.00 m             .

Number of cases read:  15    Number of cases listed:  15

There are several SPSS keywords that you can use with the recode command, including lowest, lo, hi, highest, thru, sysmis, missing, else and copy.  We recommend strongly that you recode your variables into new variables, just in case the recoding does not go as you planned.  You can use the into option with the recode command to create the new variable into which you will recode the old variable.

recode num1 (lowest thru 60 = 1) (85 thru highest = sysmis) into y1.
list num1 y1.
    NUM1       Y1

20.00     1.00
20.00     1.00
52.00     1.00
63.00      .
45.00     1.00
93.00      .
28.00     1.00
75.00      .
96.00      .
34.00     1.00
73.00      .
20.00     1.00
55.00     1.00
91.00      .
78.00      .

Number of cases read:  15    Number of cases listed:  15

#### 5.  Changing string variables into numeric variables

The main reason to convert a string variable into a numeric variable (often called "destringing") is for use in statistical analyses, as very few analysis procedures will allow a string variable.  You can use the convert option of the recode command only if you have numbers and/or missing values in a string variable.

recode gender ("f" = 1) ("m" = 2) into sex.
recode str5 (convert) into str5a.
exe.
list gender sex str5 str5a.
GENDER        SEX STR5        STR5A

f            1.00 1            1.00
f            1.00 5            5.00
f            1.00 4            4.00
m            2.00 6            6.00
m            2.00 3            3.00
f            1.00 2            2.00
m            2.00 9            9.00
f            1.00 8            8.00
m            2.00               .
f            1.00 2            2.00
f            1.00 1            1.00
f            1.00 5            5.00
.   8            8.00
m            2.00 3            3.00
m            2.00 5            5.00

Number of cases read:  15    Number of cases listed:  15

The autorecode command converts string variables into numeric variables.  By default, the lowest value in the string variable is given a value of 1 in the new numeric variable, the next lowest value is given a value of 2, and so on.  A null string is considered to be the lowest value; hence, all cases with a value of a null string will receive a value of 1 in the new numeric variable.  SPSS also creates value labels for the new numeric variable, associating the numeric values with the string values.  Compare variables str5a and str5auto.  Although both of these new variables are the numeric version of the same string variable, str5, there are some important differences between them, such as how the missing value in str5 is handled.

autorecode gender /into sex1.
autorecode str5 /into str5auto.
autorecode str2 /into str2auto.
exe.
list gender sex1 sex str5 str5auto str5a str2 str2auto.
GENDER   SEX1      SEX STR5     STR5AUTO    STR5A STR2     STR2AUTO

f          2      1.00 1            2        1.00 d            5
f          2      1.00 5            6        5.00 c            4
f          2      1.00 4            5        4.00 a            2
m          3      2.00 6            7        6.00 b            3
m          3      2.00 3            4        3.00 d            5
f          2      1.00 2            3        2.00 d            5
m          3      2.00 9            9        9.00 f            6
f          2      1.00 8            8        8.00 b            3
m          3      2.00              1         .   a            2
f          2      1.00 2            3        2.00 x            7
f          2      1.00 1            2        1.00 x            7
f          2      1.00 5            6        5.00              1
1       .   8            8        8.00              1
m          3      2.00 3            4        3.00 a            2
m          3      2.00 5            6        5.00 b            3

Number of cases read:  15    Number of cases listed:  15

#### 6.  Counting

The count command is useful if you have items from a questionnaire that are on a Likert scale (e.g., 1 to 5).  It counts the number of occurrences of a value across a list of variables.

count total = q1 to q3 (3).
exe.
list q1 to q3 total.
      Q1       Q2       Q3    TOTAL

3.00     3.00      .       2.00
2.00     2.00    -9.00      .00
3.00     1.00     2.00     1.00
4.00     1.00     2.00      .00
-8.00     1.00     3.00     1.00
-8.00     2.00     1.00      .00
3.00    -9.00     4.00     1.00
4.00     4.00     2.00      .00
1.00     1.00     1.00      .00
2.00    -9.00     3.00     1.00
3.00     3.00     2.00     2.00
3.00     1.00     1.00     1.00
-9.00     4.00     4.00      .00
.       2.00     4.00      .00
2.00     3.00     1.00     1.00

Number of cases read:  15    Number of cases listed:  15

#### 7. The keyword "to"

When creating variables, the SPSS keyword to will create variables with consecutive numbering.  When using to in syntax to refer to variables that already exist in the data set, SPSS assumes that variables are positionally consecutive (all variables between the first variable listed and the last variable listed in the command will be included).  There are some commands in SPSS that will use the keyword to in both a positionally and a numerically consecutive manner, depending on whether existing variables are being modified in some way or whether new variables are being created.  Some of these commands include autorecode, recode, aggregate and rename variables.

autorecode v1 to v2 /into w1 to w3.
rename variables (v1 to v2 = b1 to b3).
compute z = mean(q1 to q5).
exe.
list q1 to q5 z.
      Q1       Q2       Q3       Q4       Q5        Z

3.00     3.00      .        .       2.00     2.67
2.00     2.00    -9.00      .       1.00    -1.00
3.00     1.00     2.00      .       3.00     2.25
4.00     1.00     2.00      .      -9.00     -.50
-8.00     1.00     3.00      .       2.00     -.50
-8.00     2.00     1.00      .      -9.00    -3.50
3.00    -9.00     4.00      .       2.00      .00
4.00     4.00     2.00      .       3.00     3.25
1.00     1.00     1.00      .       1.00     1.00
2.00    -9.00     3.00      .       2.00     -.50
3.00     3.00     2.00      .       5.00     3.25
3.00     1.00     1.00      .       3.00     2.00
-9.00     4.00     4.00      .       2.00      .25
.       2.00     4.00      .       1.00     2.33
2.00     3.00     1.00      .       4.00     2.50

Number of cases read:  15    Number of cases listed:  15

#### 8.  Dates

Dates are stored as numbers (actually, as floating-point numbers) in SPSS; you can add and subtract them.  Dates are stored as the number of seconds from midnight, October 14, 1582 (the beginning of the Gregorian calendar).  Therefore, you usually need to do some math in order to calculate the number of days (or months or years) between two dates.  (Sometimes it is handy to know that there are 86,400 seconds in a day.)  If your date is displayed as stars or if only part of the year is showing in the SPSS Data Editor, you can make the column wider and the dates will display properly.

compute diff = edate - dob.
compute age = diff/(60*60*24*365.25).
compute age1 = xdate.year(diff) - 1582.
compute age2 = xdate.year(edate) - xdate.year(dob).
exe.
list edate dob diff age age1 age2.
     EDATE        DOB     DIFF      AGE     AGE1     AGE2

12/19/2002 01/10/1923 2.52E+09    79.94    80.00    79.00
12/14/2002 03/06/1919 2.64E+09    83.78    84.00    83.00
11/18/2001 09/08/1945 1.77E+09    56.19    56.00    56.00
09/15/2002          .      .        .        .        .
07/03/2003 10/18/1956 1.47E+09    46.70    47.00    47.00
04/05/2002 04/14/1965 1.17E+09    36.97    37.00    37.00
03/26/2003 03/09/1942 1.93E+09    61.05    61.00    61.00
01/19/2002 05/07/1936 2.07E+09    65.70    66.00    66.00
06/06/2002 08/16/1952 1.57E+09    49.80    50.00    50.00
02/06/2003 07/17/1954 1.53E+09    48.56    49.00    49.00
07/09/2002 04/16/1941 1.93E+09    61.23    62.00    61.00
10/18/2002 06/08/1936 2.09E+09    66.36    67.00    66.00
05/20/2002 12/12/1953 1.53E+09    48.44    49.00    49.00
09/08/2003 11/08/1939 2.01E+09    63.83    64.00    64.00
. 10/23/1961      .        .        .        .

Number of cases read:  15    Number of cases listed:  15

Now suppose that you have a data set that has the date in three different columns (i.e., three variables) and you want to combine them into one variable.  You can use the date.dmy or other similar date functions to do this.

compute date = date.dmy(day,month,year).
exe.
list day month year date.
     DAY    MONTH     YEAR     DATE

23       12     1962 1.20E+10
25       11     1969 1.22E+10
12       10     2001 1.32E+10
19        8     2003 1.33E+10
10        3     1987 1.28E+10
2        6     1945 1.14E+10
16        4     1996 1.30E+10
13        7     1978 1.25E+10
11        5     1982 1.26E+10
3        2     1973 1.23E+10
31        1     1992 1.29E+10
29        3     1986 1.27E+10
25       10     1973 1.23E+10
30       12     1945 1.15E+10
7        6     1997 1.31E+10

Number of cases read:  15    Number of cases listed:  15

You can change the appearance of the variable date by expanding the column so that the number is not shown in scientific notation, and you can go the Variable View window and change the type of variable for date from numeric to date and select the display option that you like.

Now let's extract the day from the variable date.  We already have this information in the variable day, but that will provide a check that we have done this correctly.

compute exday = xdate.mday(date).
exe.
list day exday date.
     DAY    EXDAY     DATE

23    23.00 1.20E+10
25    25.00 1.22E+10
12    12.00 1.32E+10
19    19.00 1.33E+10
10    10.00 1.28E+10
2     2.00 1.14E+10
16    16.00 1.30E+10
13    13.00 1.25E+10
11    11.00 1.26E+10
3     3.00 1.23E+10
31    31.00 1.29E+10
29    29.00 1.27E+10
25    25.00 1.23E+10
30    30.00 1.15E+10
7     7.00 1.31E+10

Number of cases read:  15    Number of cases listed:  15

#### 9.  Documenting data

There are many ways to document your data using SPSS.  There are also several commands that you can use to view the documentation that you have created, including sysfile info and display.  When using the sysfile info command, you must specify the file path.  Also, the maximum length of a variable label is 255 characters and the maximum length of a value label is 60 characters.

sysfile info 'd:\data.sav'.
document I collected this data on January 16, 2003 and
blah blah blah.
display document.
* document drop.
file label SPSS Syntax Seminar data file.
save outfile 'd:\data1.sav'.
sysfile info 'd:\data1.sav'.
variable labels str1 'answer to question 7'
display labels.
value labels q1 1 'strongly disagree' 2 'disagree' 3 'agree' 4 'strongly agree'.
value labels q2 to q3 q5 1 'strongly disagree' 2 'disagree' 3 'agree'
4 'strongly agree'.
freq var = q1 to q5.
save outfile 'd:\data2.sav'.
display dictionary.

#### 10.  Missing data

There are two different types of missing data in SPSS:  system-missing and user-defined missing.  System-missing is displayed as a dot (.) in the column of a numerical variable.  String variables cannot have system-missing values; even a null string is considered a value.  You can define your own missing values (called user-defined missing) for either numeric or short string variables.  Missing values are considered the lowest possible value in SPSS.  Although displayed differently, both system-missing and user-defined missing values are just missing values to SPSS; they are treated the same way (except in filter variables, see below).  Both will be deleted from analyses that call for case-wise deletion.  The only "difference" is that they will be displayed in separate categories in crosstabs, frequencies, etc.

missing values q1 to q5 (-9).
exe.
missing values q1 (-8).
exe.
missing values q1 (-9 -8).
missing values str1 ('x').
exe.

It is important to realize is that you can create the same variable in different ways, and that the missing values may be handled differently.

compute y = q1+q2.
compute y1 = sum(q1, q2).
exe.
list q1 q2 y y1.
      Q1       Q2        Y       Y1

3.00     3.00     6.00     6.00
2.00     2.00     4.00     4.00
3.00     1.00     4.00     4.00
4.00     1.00     5.00     5.00
-8.00     1.00      .       1.00
-8.00     2.00      .       2.00
3.00    -9.00      .       3.00
4.00     4.00     8.00     8.00
1.00     1.00     2.00     2.00
2.00    -9.00      .       2.00
3.00     3.00     6.00     6.00
3.00     1.00     4.00     4.00
-9.00     4.00      .       4.00
.       2.00      .       2.00
2.00     3.00     5.00     5.00

Number of cases read:  15    Number of cases listed:  15

#### 11.  Creating and using filters (subsetting data)

You can create variables to use as filter variables and keep them in your data set.  In constructing a variable to use as a filter variable, we suggest that you create a 0/1 (dummy) variable, where the cases with the 0's will be filtered out.  It is important to note that SPSS does not treat system-missing and user-defined missing values the same way when applying the filter:  cases with system-missing values will be filtered out, but cases with user-defined missing values will not.  In other words, SPSS only looks for two specific values to be filtered out of your data:  0 and system-missing.  You can use either the filter off command or the use all command to end the filtering of your data.  The select if command will permanently delete data from your data file.  The command select if is the same as using the filter in the point-and-click interface with the "delete" radio button selected.

filter by fltr.
desc num1 num2.

filter off.
* use all.
desc num1 num2.

One command that can be used only via syntax is temporary.  In the code below, we will use the temporary command so that our observations are not permanently deleted from our data file when we use the select if command.  The temporary command stays in effect only until the next executable command is executed.  That is why the output for the first list command (which is the first executable command after temporary) has only seven observations (the seven that met the criteria listed on the select if command), while the second list command includes all of the observations from our data set.  Although for this seminar we only use the temporary command while subsetting, it has many other uses.

temporary.
select if (gender = "f" and q1 ge 2).
list num1.
list num1.
    NUM1

20.00
20.00
52.00
75.00
34.00
73.00
20.00

Number of cases read:  7    Number of cases listed:  7
    NUM1

20.00
20.00
52.00
63.00
45.00
93.00
28.00
75.00
96.00
34.00
73.00
20.00
55.00
91.00
78.00

Number of cases read:  15    Number of cases listed:  15

Another command that you can use to subset your data is split file.  You will first need to sort your data by the variable to will be used in the split file command.  The split file command will remain in effect until you use the split file off command to turn it off.

sort cases by gender.
split file by gender.
desc num1 num2.

In this data set there are actually three values of gender: missing (a null string), "f" and "m".  Notice also that you do not get the total for all cases.

split file off.
desc num1 num2.

#### 12.  Collapsing across observations

The aggregate command creates a new data set that is aggregated (or collapsed) by a variable or variables.  The command also creates one or more new variables that require that the original variables be aggregated.  There are about a dozen functions that can be used to create these new variables.  Because a new data file is being created and replaces the one in the Data Editor, we strongly suggest that you save your current data file before running this command.  The aggregate command ignores all split file commands.

get file 'd:\data.sav'.
aggregate outfile 'd:\new.sav'
/break gender
/aveq1 = mean(q1).
get file 'd:\new.sav'.
list.
GENDER      AVEQ1

-9.00
f            1.50
m             .40

Number of cases read:  3    Number of cases listed:  3
get file 'd:\data.sav'.
aggregate outfile 'd:\new1.sav'
/break gender
/aveq1 = mean(q1)
/sumq1 = sum(q1)
/miss3 = numiss(q3)
/pin5 = pin(q5, 2, 4).
get file 'd:\new1.sav'.
list.
GENDER      AVEQ1    SUMQ1   MISS3  PIN5

-9.00    -9.00       0 100.0
f            1.50    12.00       1  62.5
m             .40     2.00       0  50.0

Number of cases read:  3    Number of cases listed:  3

#### 13.  Reshaping data

The varstocases command can be used to reshape data from the wide to the long format.  Note that reshaping data (either from long to wide or from wide to long) involves creating a new data set that will replace the data set currently open in the SPSS Data Editor.  Therefore, it is VERY important that you save a copy of your original data set before reshaping it.

get file 'd:\data.sav'.
list q1 to q3
/cases from 1 to 10.
      Q1       Q2       Q3

3.00     3.00      .
2.00     2.00    -9.00
3.00     1.00     2.00
4.00     1.00     2.00
-8.00     1.00     3.00
-8.00     2.00     1.00
3.00    -9.00     4.00
4.00     4.00     2.00
1.00     1.00     1.00
2.00    -9.00     3.00

Number of cases read:  10    Number of cases listed:  10

In the varstocases command below, the /index subcommand creates a variable that tells you what variable the data point came from (in this case, q1, q2 or q3).  The /id subcommand creates a variable that tells you from what row in the original data set the data point came from.  The /drop subcommand is optional and is used only to get rid of unwanted variables in the new data set.

varstocases
/make q from q1 to q3
/index = number
/id = id
/drop num1 to year.
list.
      ID NUMBER        Q

1     1      3.00
1     2      3.00
2     1      2.00
2     2      2.00
2     3     -9.00
3     1      3.00
3     2      1.00
3     3      2.00
4     1      4.00
4     2      1.00
4     3      2.00
5     1     -8.00
5     2      1.00
5     3      3.00
6     1     -8.00
6     2      2.00
6     3      1.00
7     1      3.00
7     2     -9.00
7     3      4.00
8     1      4.00
8     2      4.00
8     3      2.00
9     1      1.00
9     2      1.00
9     3      1.00
10     1      2.00
10     2     -9.00
10     3      3.00
11     1      3.00
11     2      3.00
11     3      2.00
12     1      3.00
12     2      1.00
12     3      1.00
13     1     -9.00
13     2      4.00
13     3      4.00
14     2      2.00
14     3      4.00
15     1      2.00
15     2      3.00
15     3      1.00

Number of cases read:  43    Number of cases listed:  43

The casestovars command can be used to reshape data from the long to the wide format.  Note that there is very useful information in the output and that there are labels for the variables.

get file 'd:\long.sav'.
list.
   TRIAL     OUT1     OUT2 IVAR

1.00    26.00     1.00 a
1.00    32.00     4.00 b
1.00    31.00     5.00 c
2.00    32.00     2.00 a
2.00    36.00     9.00 b
2.00    33.00     4.00 c
3.00    35.00     3.00 a
3.00    38.00     2.00 b
3.00    35.00     5.00 c
4.00     6.00     5.00 a
4.00     2.00     3.00 b
4.00     5.00     4.00 c
5.00     5.00     6.00 a
5.00     5.00     1.00 b
5.00     3.00     7.00 c

Number of cases read:  15    Number of cases listed:  15
sort cases by trial.
casestovars
/id = trial
/index = ivar
/drop out2.
list.

   TRIAL        A        B        C

1.00    26.00    32.00    31.00
2.00    32.00    36.00    33.00
3.00    35.00    38.00    35.00
4.00     6.00     2.00     5.00
5.00     5.00     5.00     3.00

Number of cases read:  5    Number of cases listed:  5

#### 14.  "By" and "with" in ANOVA and logistic regression

In most of the analysis commands in SPSS, the keyword by indicates that a categorical variable or variables will follow, while the keyword with indicates that a continuous variable or variables will follow.

get file 'd:\data.sav'.
unianova num1 by gender.

unianova num1 by gender with q1.

logistic regression binary with num1 by gender
/categorical gender.
regress
/dependent num1
/method = enter num2 binary.

#### 15.  Pasting code

GET
FILE='D:\data.sav'.
analyze - descriptives - explore.
EXAMINE
VARIABLES=num1 BY gender
/PLOT BOXPLOT STEMLEAF
/COMPARE GROUP
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.
examine num1 by gender.

Notice that we get exactly the same output using both of the examine commands above.  As you can see, when you paste the code, SPSS includes many of the default options, and these clutter the code.  It is a good idea to play around with code that you have pasted to see what subcommands can be eliminated without changing the output.  In the example above, all of the subcommands can be eliminated.

#### 16.  The SPSS syntax guide

You can access the SPSS syntax guide by clicking on "Help", "Syntax Guide" and "Base" from any of the SPSS windows (the Data Editor, Syntax or Output windows).

#### 17.  System variables

SPSS sometimes uses internal variables that you never see in the Data Editor.  You can call on some of these internal variables, which SPSS calls "system variables," to make certain tasks easier.  All system variables begin with a $. For example, SPSS keeps information about case numbers (which are the numbers that you see along the left side of the Data Editor in the gray bar) in a system variable called$casenum.  You can use this variable if you want to create an id variable that is part of your data set.

compute id = $casenum. exe. Another handy system variable is$sysmis, which can be used when you want to specify that a newly created variable (or some of its values) should be set to system missing.

compute miss = $sysmis. compute miss1 = 1. if missing(q1) or missing(q3) miss1 =$sysmis.
exe.
list miss q1 q3 miss1.

    MISS       Q1       Q3    MISS1

.       3.00      .        .
.       2.00    -9.00     1.00
.       3.00     2.00     1.00
.       4.00     2.00     1.00
.      -8.00     3.00     1.00
.      -8.00     1.00     1.00
.       3.00     4.00     1.00
.       4.00     2.00     1.00
.       1.00     1.00     1.00
.       2.00     3.00     1.00
.       3.00     2.00     1.00
.       3.00     1.00     1.00
.      -9.00     4.00     1.00
.        .       4.00      .
.       2.00     1.00     1.00

Number of cases read:  15    Number of cases listed:  15

When working with dates, a potentially useful system variable is $jdate. This variable gives the current date as the number of days from October 14, 1582. compute today =$jdate.
exe.

We have many Learning Modules and Frequently Asked Questions that will provide additional information:

We also have some great books that you can check out from our Stat Books for Loan , including

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.