SAS FAQ
How do I do simple random sampling with or without replacement using proc surveyselect?

Sometimes you may be analyzing a very large data file and want to work with just a simple random sample of the data file. Other times you may want to draw a simple random sample with replacement from a small data file. Either way, SAS proc surveyselect is one way to do it, and it is fairly straightforward. Let's use the following data set for the purpose of demonstration.

DATA hsb25;
  INPUT id gender $ race ses schtype $ prog
        read write math science socst;
DATALINES;
 147 f 1 3 pub 1 47  62  53  53  61
 108 m 1 2 pub 2 34  33  41  36  36
  18 m 3 2 pub 3 50  33  49  44  36
 153 m 1 2 pub 3 39  31  40  39  51
  50 m 2 2 pub 2 50  59  42  53  61
  51 f 2 1 pub 2 42  36  42  31  39
 102 m 1 1 pub 1 52  41  51  53  56
  57 f 1 2 pub 1 71  65  72  66  56
 160 f 1 2 pub 1 55  65  55  50  61
 136 m 1 2 pub 1 65  59  70  63  51
  88 f 1 1 pub 1 68  60  64  69  66
 177 m 1 2 pri 1 55  59  62  58  51
  95 m 1 1 pub 1 73  60  71  61  71
 144 m 1 1 pub 2 60  65  58  61  66
 139 f 1 2 pub 1 68  59  61  55  71
 135 f 1 3 pub 1 63  60  65  54  66
 191 f 1 1 pri 1 47  52  43  48  61
 171 m 1 2 pub 1 60  54  60  55  66
  22 m 3 2 pub 3 42  39  39  56  46
  47 f 2 3 pub 1 47  46  49  33  41
  56 m 1 2 pub 3 55  45  46  58  51
 128 m 1 1 pub 1 39  33  38  47  41
  36 f 2 3 pub 2 44  49  44  35  51
  53 m 2 2 pub 3 34  37  46  39  31
  26 f 4 1 pub 1 60  59  62  61  51
;
RUN;

Random sampling without replacement

In a simple random sample without replacement each observation in the data set has an equal chance of being selected, once selected it can not be chosen again. The following code creates a simple random sample of size 10 from the data set hsb25. Here the method option on the proc surveyselect statement specifies the method to be SRS (simple random sampling). The rep (=replicate) option specifies the number of simple random samples you want create. The sampsize is a required option here specifying the size of the random sample. This number has to be smaller than the size of the original data set, since the sampling is done without replacement.  You can also specify the seed so a precise replicate can be reproduced later using the same seed. The id statement is used to specify the variables to be included in the sample. Here we use _all_ to include all the variables to be in the sample.

proc surveyselect data = hsb25 method = SRS rep = 1 
  sampsize = 10 seed = 12345 out = hsbs1;
  id _all_;
run;
proc print data = hsbs1 noobs;
run;

 id  gender  race  ses  schtype  prog  read  write  math  science  socst
108    m       1    2     pub      2    34     33    41      36      36
153    m       1    2     pub      3    39     31    40      39      51
 51    f       2    1     pub      2    42     36    42      31      39
 95    m       1    1     pub      1    73     60    71      61      71
139    f       1    2     pub      1    68     59    61      55      71
135    f       1    3     pub      1    63     60    65      54      66
191    f       1    1     pri      1    47     52    43      48      61
 22    m       3    2     pub      3    42     39    39      56      46
 47    f       2    3     pub      1    47     46    49      33      41
 53    m       2    2     pub      3    34     37    46      39      31

Random sampling with replacement

In a random sample with replacement, each observation in the data set has an equal chance to be selected and can be selected over and over again. The following code creates a random sample with replacement of size 10. We can see from the output that observations with id = 139 and id = 128 have been selected twice because we now allow replacement in the sampling. The method = urs (unrestricted random sampling) is used here to allow the replacement. We will only include variables id, read, write, math, science and socst in the sample data set.
proc surveyselect data=hsb25  method = urs sampsize = 10
   rep=1 seed=12345 out=hsbs2 out=outhits;
   id id read write math science socst;
run;
proc print data = hsbs2 noobs;
run;

                                                                 Number
Replicate     id    read    write    math    science    socst     Hits

    1         57     71       65      72        66        56        1
    1        136     65       59      70        63        51        1
    1        177     55       59      62        58        51        1
    1        139     68       59      61        55        71        2
    1        191     47       52      43        48        61        1
    1         56     55       45      46        58        51        1
    1        128     39       33      38        47        41        2
    1         26     60       59      62        61        51        1

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.