UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ
How can I find things in a character variable in SAS?

You can find a specific character, such as a letter, a group of letters, or special characters, by using the index function. For example, suppose that you have a data file with names and other information and you want to identify only those records for people with the letter "a" in their name.  You could use the index function as shown below.  First, let's input an example data set and use proc print to see that it was entered correctly.

data temp;
input name $ 1-12 age;
cards;
Harvey Smith 30
John West    35
Jim Cann     41
James Harvey 32
Harvy Adams  33
;
run;

proc print data = temp;
run;
Obs    name            age

 1     Harvey Smith     30
 2     John West        35
 3     Jim Cann         41
 4     James Harvey     32
 5     Harvy Adams      33

Now, let's use the index function to find the cases with the letter "a" in the name.

data temp1;
set temp;
x = index(name, "a");
run;

proc print data = temp1;
run;
Obs    name            age    x

 1     Harvey Smith     30    2
 2     John West        35    0
 3     Jim Cann         41    6
 4     James Harvey     32    2
 5     Harvy Adams      33    2

The values of the variable x tell us the first location in the variable name where SAS encountered the letter "a".  In the second observation, John West does not have the letter "a" in his name, so a value of 0 was returned. 

Searching for a single letter doesn't make much sense.  Now let's search for a name, say Harvey.  Again, you could use the index function to search the variable name for "Harvey".  The second argument, called the excerpt, needs to be a little different in this case.  We need to put the value "Harvey" in a variable (which we called search) and then search for that variable.  Otherwise, SAS will search the variable name for any of the characters listed in the excerpt, which is not what we want.  In this example, SAS tells us where it first found the variable that we asked it to search for by putting the location in the variable x.  In other words, the value in x is the position at which the first occurrence of "Harvey" was found.

data temp2;
set temp;
search = "Harvey";
x = index(name, search);
run;

proc print data = temp2;
run;
Obs    name            age    search    x

 1     Harvey Smith     30    Harvey    1
 2     John West        35    Harvey    0
 3     Jim Cann         41    Harvey    0
 4     James Harvey     32    Harvey    7
 5     Harvy Adams      33    Harvey    0

Now let's suppose that you wanted to search for one of several characters in a string variable.  For example, perhaps you want to search for "-", "_" or "X".  To accomplish this, you could use the indexc function, which will allow you to supply multiple excerpts. The variable found1 is included to show why you cannot use the index function and supply it will all of the characters for which you are searching.

data temp3;
input string $ 1-11;
cards;
4-5 abc XxX
11_ jkl xxx
abc 3-5 jjj
xXx ()1 lll
xxx 344 aaa
;
run;

data temp4;
set temp3;
found = indexc(string, "-", "_", "X");
found1 = index(string, "-_X");
run;

proc print data = temp4;
run;
Obs      string       found    found1

 1     4-5 abc XxX      2         0
 2     11_ jkl xxx      3         0
 3     abc 3-5 jjj      6         0
 4     xXx ()1 lll      2         0
 5     xxx 344 aaa      0         0

As you can see from the output above, the value in the variable found indicates the position that the first of any of the characters listed in the indexc function was encountered.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.