SAS Code Fragments
Matching with a wildcard using Perl regular expression

Example 1: Let's say that we want to extract observations where a particular variable text begins with "Inc" and ends with "b1". It does not matter what is in the middle. We first create a test data set. The wildcard is simply ".+" since "." is anything and ".+" is one or more of anything.

data test;
  length text $16;
  input a text $;
cards;
1 Inc.F1b1
1 Inc.F2b1
2 Ltd.F4b2
2 Ltd.D5c1
;
run;
data test2;
    retain re;
  if _n_=1 then do;
    re = prxparse('/Inc.+b1/');
    end;
  set test;
  if prxmatch(re, text) then flag=1;
  else flag = 0;
  run;
proc print data = test2;
run;
Obs    re      text      a    flag
 1      1    Inc.F1b1    1      1
 2      1    Inc.F2b1    1      1
 3      1    Ltd.F4b2    2      0
 4      1    Ltd.D5c1    2      0
proc print data = test;
where prxmatch('/Inc.+b1/', text);
run;
Obs      text      a
 1     Inc.F1b1    1
 2     Inc.F2b1    1

proc means data= test;
 var a;
where prxmatch('/Inc.+b1/', text);
run; 
The MEANS Procedure
                      Analysis Variable : a
N            Mean         Std Dev         Minimum         Maximum
-----------------------------------------------------------------
2       1.0000000               0       1.0000000       1.0000000
-----------------------------------------------------------------

Example 2: Dealing with real period ".". Let's look at another slightly different situation. Our data looks like this.

data test;
  length text $16;
  input a text $3-12;
cards;
1 Inc. F1b1
1 Inc  F2b1
2 Ltd. F4b2
2 Ltd  D5c1
;
run;

We only want to extract those rows that starts with "Inc." and ends with "b1". Notice that we want the "Inc" with the period "." with it. That is the row(s) we want to extract will be only the first row. If we use the code for example 1, we will extract both row 1 and 2, since "." is anything, not a real period. In Perl, "\." represents the real period ".". So here is how the syntax goes.

data test2;
    retain re;
  if _n_=1 then do;
    re = prxparse('/Inc\..+b1/');
    end;
  set test;
  if prxmatch(re, text) then flag=1;
  else flag = 0;
  run;
proc print data = test2;
run;
Obs    re      text       a    flag
 1      1    Inc. F1b1    1      1
 2      1    Inc F2b1     1      0
 3      1    Ltd. F4b2    2      0
 4      1    Ltd D5c1     2      0
proc print data = test;
where prxmatch('/Inc\..+b1/', text);
run;
Obs      text       a
 1     Inc. F1b1    1
proc means data= test;
 var a;
where prxmatch('/Inc\..+b1/', text);
run; 
                      Analysis Variable : a
N            Mean         Std Dev         Minimum         Maximum
-----------------------------------------------------------------
1       1.0000000               .       1.0000000       1.0000000
-----------------------------------------------------------------

For more information, you can visit SAS webpage on Perl Regular Expressions.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.