|
|
|
||||
|
|
|||||
You can find a specific character, such as a letter, a group of letters, or special characters, by using the index function. For example, suppose that you have a data file with names and other information and you want to identify only those records for people with the letter "a" in their name. You could use the index function as shown below. First, let's input an example data set and use proc print to see that it was entered correctly.
data temp; input name $ 1-12 age; cards; Harvey Smith 30 John West 35 Jim Cann 41 James Harvey 32 Harvy Adams 33 ; run; proc print data = temp; run; Obs name age 1 Harvey Smith 30 2 John West 35 3 Jim Cann 41 4 James Harvey 32 5 Harvy Adams 33
Now, let's use the index function to find the cases with the letter "a" in the name.
data temp1; set temp; x = index(name, "a"); run; proc print data = temp1; run; Obs name age x 1 Harvey Smith 30 2 2 John West 35 0 3 Jim Cann 41 6 4 James Harvey 32 2 5 Harvy Adams 33 2
The values of the variable x tell us the first location in the variable name where SAS encountered the letter "a". In the second observation, John West does not have the letter "a" in his name, so a value of 0 was returned.
Searching for a single letter doesn't make much sense. Now let's search for a name, say Harvey. Again, you could use the index function to search the variable name for "Harvey". The second argument, called the excerpt, needs to be a little different in this case. We need to put the value "Harvey" in a variable (which we called search) and then search for that variable. Otherwise, SAS will search the variable name for any of the characters listed in the excerpt, which is not what we want. In this example, SAS tells us where it first found the variable that we asked it to search for by putting the location in the variable x. In other words, the value in x is the position at which the first occurrence of "Harvey" was found.
data temp2; set temp; search = "Harvey"; x = index(name, search); run; proc print data = temp2; run; Obs name age search x 1 Harvey Smith 30 Harvey 1 2 John West 35 Harvey 0 3 Jim Cann 41 Harvey 0 4 James Harvey 32 Harvey 7 5 Harvy Adams 33 Harvey 0
Now let's suppose that you wanted to search for one of several characters in a string variable. For example, perhaps you want to search for "-", "_" or "X". To accomplish this, you could use the indexc function, which will allow you to supply multiple excerpts. The variable found1 is included to show why you cannot use the index function and supply it will all of the characters for which you are searching.
data temp3; input string $ 1-11; cards; 4-5 abc XxX 11_ jkl xxx abc 3-5 jjj xXx ()1 lll xxx 344 aaa ; run; data temp4; set temp3; found = indexc(string, "-", "_", "X"); found1 = index(string, "-_X"); run; proc print data = temp4; run; Obs string found found1 1 4-5 abc XxX 2 0 2 11_ jkl xxx 3 0 3 abc 3-5 jjj 6 0 4 xXx ()1 lll 2 0 5 xxx 344 aaa 0 0
As you can see from the output above, the value in the variable found indicates the position that the first of any of the characters listed in the indexc function was encountered.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services