UCLA Academic Technology Services HomeServicesClassesContactJobs

SAS FAQ
How do I do simple random sampling with or without replacement using proc surveyselect?

Sometimes you may be analyzing a very large data file and want to work with just a simple random sample of the data file. Other times you may want to draw a simple random sample with replacement from a small data file. Either way, SAS proc surveyselect is one way to do it and it is fairly straightforward. Let's use the following data set for the purpose of demonstration.

DATA hsb25;
  INPUT id gender $ race ses schtype $ prog
        read write math science socst;
DATALINES;
 147 f 1 3 pub 1 47  62  53  53  61
 108 m 1 2 pub 2 34  33  41  36  36
  18 m 3 2 pub 3 50  33  49  44  36
 153 m 1 2 pub 3 39  31  40  39  51
  50 m 2 2 pub 2 50  59  42  53  61
  51 f 2 1 pub 2 42  36  42  31  39
 102 m 1 1 pub 1 52  41  51  53  56
  57 f 1 2 pub 1 71  65  72  66  56
 160 f 1 2 pub 1 55  65  55  50  61
 136 m 1 2 pub 1 65  59  70  63  51
  88 f 1 1 pub 1 68  60  64  69  66
 177 m 1 2 pri 1 55  59  62  58  51
  95 m 1 1 pub 1 73  60  71  61  71
 144 m 1 1 pub 2 60  65  58  61  66
 139 f 1 2 pub 1 68  59  61  55  71
 135 f 1 3 pub 1 63  60  65  54  66
 191 f 1 1 pri 1 47  52  43  48  61
 171 m 1 2 pub 1 60  54  60  55  66
  22 m 3 2 pub 3 42  39  39  56  46
  47 f 2 3 pub 1 47  46  49  33  41
  56 m 1 2 pub 3 55  45  46  58  51
 128 m 1 1 pub 1 39  33  38  47  41
  36 f 2 3 pub 2 44  49  44  35  51
  53 m 2 2 pub 3 34  37  46  39  31
  26 f 4 1 pub 1 60  59  62  61  51
;
RUN;

Random sampling without replacement

In a simple random sample without replacement each observation in the data set has an equal chance of being selected, once selected it can not be chosen again. The following code creates a simple random sample of size 10 from data set hsb25. Here the method option in proc surveyselect statement specifies the method to be SRS (simple random sampling). The rep (=replicate) option specifies the number of simple random samples you want create. The sampsize is a required option here specifying the size of the random sample. This number has to be smaller than the size of the original data set, since the sampling is done without replacement.  You can also specify the seed so a precise replicate can be reproduced later using the same seed. The id statement is used to specify the variables to be included in the sample. Here we use the _all_ to include all the variables to be in the sample.

proc surveyselect data = hsb25 method = SRS rep = 1 
                         sampsize = 10 seed = 12345 out = hsbs1;
  id _all_;
run;
proc print data = hsbs1 noobs;
run;
 id  gender  race  ses  schtype  prog  read  write  math  science  socst
108    m       1    2     pub      2    34     33    41      36      36
153    m       1    2     pub      3    39     31    40      39      51
 51    f       2    1     pub      2    42     36    42      31      39
 95    m       1    1     pub      1    73     60    71      61      71
139    f       1    2     pub      1    68     59    61      55      71
135    f       1    3     pub      1    63     60    65      54      66
191    f       1    1     pri      1    47     52    43      48      61
 22    m       3    2     pub      3    42     39    39      56      46
 47    f       2    3     pub      1    47     46    49      33      41
 53    m       2    2     pub      3    34     37    46      39      31

Random sampling with replacement

In a random sample with replacement, each observation in the data set has an equal chance to be selected and can be selected over and over again. The following code creates a random sample with replacement of size 10. We can see from the output that observation with id= 144 has been selected three times because that we now allow replacement in the sampling. The method = urs (unrestricted random sampling) is used here to allow the replacement. We will only include variables id, read, write, math, science and socst in the sample data set.
proc surveyselect data=hsb25  method = urs sampsize = 10
   rep=1 seed=12345 out=hsbs2;
   id id read write math science socst;
run;
proc print data = hsbs2 noobs;
run;
                                                    Number
 id    read    write    math    science    socst     Hits
 22     42       39      39        56        46        1
 47     47       46      49        33        41        1
 51     42       36      42        31        39        1
 57     71       65      72        66        56        1
139     68       59      61        55        71        1
144     60       65      58        61        66        3
147     47       62      53        53        61        1
153     39       31      40        39        51        1
The data set hsbs2 has only 8 observations, because observation with id = 144 should be counted three times. Here is sample code to create a data set with 10 observations based on hsbs2.
data hsbs2f;
  set hsbs2;
  do i = 1 to numberhits;
    output;
  end;
  drop i;
run;
proc print data = hsbs2f noobs;
run;
                                                    Number
 id    read    write    math    science    socst     Hits
 22     42       39      39        56        46        1
 47     47       46      49        33        41        1
 51     42       36      42        31        39        1
 57     71       65      72        66        56        1
139     68       59      61        55        71        1
144     60       65      58        61        66        3
144     60       65      58        61        66        3
144     60       65      58        61        66        3
147     47       62      53        53        61        1
153     39       31      40        39        51        1

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.


The %srs macro was written by Jonah Schlackman.