UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ 
How can I take a stratified random sample of my data?

Sometimes you may want to take a random sample of your data, but you want to respect the stratification that was used when the data set was created.  Other times you want to maintain certain proportions in the sampled data set; for example, drawing a sample from a data set, but having proportions of males and females that correspond to the current census figures.  To draw these types of samples from your data set, you can use proc surveyselect.  We will use the hsb2 data set for our examples. Notice that the code on this page works with SAS 8.x. For updated version using SAS 9, please visit our updated page.

Example 1:  Taking a 50% sample from each strata using simple random sampling (srs)

Before we take our sample, let's look at the data set using proc means.  Because we will use a by statement, we need to sort the data first.  We will use the variable female as our stratification variable.  Also, we will use an options statement to suppress the showing of the variable labels in the output.

proc sort data = "D:\hsb2";
by female;
run;

options nolabel;
proc means data = "D:\hsb2";
by female;
run;

female=0

The MEANS Procedure

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           91     106.0109890      60.3122421       3.0000000     200.0000000
race         91       3.4285714       1.0867163       1.0000000       4.0000000
ses          91       2.1538462       0.6818777       1.0000000       3.0000000
schtyp       91       1.1538462       0.3628001       1.0000000       2.0000000
prog         91       2.0219780       0.6988566       1.0000000       3.0000000
read         91      52.8241758      10.5067105      31.0000000      76.0000000
write        91      50.1208791      10.3051607      31.0000000      67.0000000
math         91      52.9450549       9.6647845      35.0000000      75.0000000
science      91      53.2307692      10.7321707      26.0000000      74.0000000
socst        91      51.7912088      11.3338397      26.0000000      71.0000000
-------------------------------------------------------------------------------


female=1

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id          109      95.8990826      55.6275553       1.0000000     198.0000000
race        109       3.4311927       1.0033921       1.0000000       4.0000000
ses         109       1.9724771       0.7510328       1.0000000       3.0000000
schtyp      109       1.1651376       0.3730197       1.0000000       2.0000000
prog        109       2.0275229       0.6866278       1.0000000       3.0000000
read        109      51.7339450      10.0578348      28.0000000      76.0000000
write       109      54.9908257       8.1337152      35.0000000      67.0000000
math        109      52.3944954       9.1510153      33.0000000      72.0000000
science     109      50.6972477       9.0385026      29.0000000      69.0000000
socst       109      52.9174312      10.2344086      26.0000000      71.0000000
-------------------------------------------------------------------------------

In the command below we have used several options.  We have used the data = option to specify the data set from which we wish to draw the sample.  The method option indicates the method by which we would like the sample drawn.  SAS offers a wide range of options for this, including probability-proportional-to-size and systematic sampling.  The samprate option is used to specify the sampling rate.  Here, we have indicated .5, which means 50%.  We have used the seed option to set the seed so that our results will be replicable.  On the strata statement we specify the variable (or variables) that define the strata.

proc surveyselect data = "D:\hsb2" out = samp1 method = srs samprate = .5 seed = 9876;
strata female;
run;

proc sort data = samp1;
by female;
run;

proc means data = samp1;
by female;
run;
female=0

The MEANS Procedure

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 46      94.5869565      60.1141788       5.0000000     197.0000000
race               46       3.1956522       1.2582098       1.0000000       4.0000000
ses                46       2.1521739       0.6981688       1.0000000       3.0000000
schtyp             46       1.0652174       0.2496374       1.0000000       2.0000000
prog               46       2.2173913       0.7276459       1.0000000       3.0000000
read               46      50.6956522      10.5848310      31.0000000      73.0000000
write              46      47.4565217      10.2473986      31.0000000      65.0000000
math               46      53.0869565       9.5657400      38.0000000      75.0000000
science            46      51.3043478      11.7735477      26.0000000      74.0000000
socst              46      48.9565217      12.4185462      26.0000000      71.0000000
SelectionProb      46       0.5054945               0       0.5054945       0.5054945
SamplingWeight     46       1.9782609               0       1.9782609       1.9782609
-------------------------------------------------------------------------------------


female=1

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 55      82.2727273      54.4056964       1.0000000     194.0000000
race               55       3.2545455       1.1420933       1.0000000       4.0000000
ses                55       1.9636364       0.7444520       1.0000000       3.0000000
schtyp             55       1.1090909       0.3146266       1.0000000       2.0000000
prog               55       2.1636364       0.7139778       1.0000000       3.0000000
read               55      50.4545455      10.3705748      28.0000000      76.0000000
write              55      54.3636364       8.5729195      35.0000000      67.0000000
math               55      51.9818182       9.9712381      33.0000000      72.0000000
science            55      50.4727273      10.2791673      31.0000000      69.0000000
socst              55      52.3272727      10.2885311      31.0000000      71.0000000
SelectionProb      55       0.5045872               0       0.5045872       0.5045872
SamplingWeight     55       1.9818182               0       1.9818182       1.9818182
-------------------------------------------------------------------------------------

If you want to know which cases were not selected, or if you want to use the two samples for validation purposes, you have to merge the sampled data set back with the original data set.  An example is given below.  Note that we need to sort both the original data set and the sampled data set on the same variable.  This variable must uniquely identify each case in the data set.  You can tell which cases were selected into the sample because they have values for Selection Prob and Sampling Weight.  These variables were created by proc surveyselect, and hence are not in the original data file.  If you want to create three or more data sets from your original data set, you can use Enterprise Miner. 

proc sort data = "D:\hsb2";
by id;
run;

proc sort data = samp1;
by id;
run;

data merge1;
set "D:\hsb2" samp1;
by id;
run;

proc print data = merge1 (obs = 25);
run;
                                                                             Selection  Sampling
Obs  id  female  race  ses  schtyp  prog  read  write  math  science  socst     Prob     Weight

  1   1     1      1    1      1      3    34     44    40      39      41     .          .
  2   1     1      1    1      1      3    34     44    40      39      41    0.50459    1.98182
  3   2     1      1    2      1      3    39     41    33      42      41     .          .
  4   2     1      1    2      1      3    39     41    33      42      41    0.50459    1.98182
  5   3     0      1    1      1      2    63     65    48      63      56     .          .
  6   4     1      1    1      1      2    44     50    41      39      51     .          .
  7   4     1      1    1      1      2    44     50    41      39      51    0.50459    1.98182
  8   5     0      1    1      1      2    47     40    43      45      31     .          .
  9   5     0      1    1      1      2    47     40    43      45      31    0.50549    1.97826
 10   6     1      1    1      1      2    47     41    46      40      41     .          .
 11   6     1      1    1      1      2    47     41    46      40      41    0.50459    1.98182
 12   7     0      1    2      1      2    57     54    59      47      51     .          .
 13   7     0      1    2      1      2    57     54    59      47      51    0.50549    1.97826
 14   8     1      1    1      1      2    39     44    52      44      48     .          .
 15   9     0      1    2      1      3    48     49    52      44      51     .          .
 16  10     1      1    2      1      1    47     54    49      53      61     .          .
 17  10     1      1    2      1      1    47     54    49      53      61    0.50459    1.98182
 18  11     0      1    2      1      2    34     46    45      39      36     .          .
 19  11     0      1    2      1      2    34     46    45      39      36    0.50549    1.97826
 20  12     0      1    2      1      3    37     44    45      39      46     .          .
 21  13     1      1    2      1      3    47     46    39      47      61     .          .
 22  13     1      1    2      1      3    47     46    39      47      61    0.50459    1.98182
 23  14     0      1    3      1      2    47     41    54      42      56     .          .
 24  14     0      1    3      1      2    47     41    54      42      56    0.50549    1.97826
 25  15     0      1    3      1      3    39     39    44      26      42     .          .

Example 2:  Using more than one strata variable

In this example, we will use three strata variables.  The variable female has two values, and the variable ses has three levels.  As before, we will sort the original data set on the strata variables, and then we will do a proc means to see what the variables look like before we draw the sample. 

proc sort data = "D:\hsb2";
by female ses prog;
run;

proc means data = "D:\hsb2";
by female ses;
run;
female=0 ses=1

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           15      79.2000000      57.7262010       3.0000000     169.0000000
race         15       3.1333333       1.2459458       1.0000000       4.0000000
schtyp       15       1.0000000               0       1.0000000       1.0000000
prog         15       1.8000000       0.8618916       1.0000000       3.0000000
read         15      49.3333333       9.0999738      36.0000000      63.0000000
write        15      46.6000000       9.0301084      31.0000000      65.0000000
math         15      47.6000000       6.7802233      39.0000000      63.0000000
science      15      49.8000000      12.9735996      31.0000000      69.0000000
socst        15      43.3333333       9.9618319      26.0000000      57.0000000
-------------------------------------------------------------------------------

female=0 ses=2

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           47     109.6808511      64.3256343       7.0000000     200.0000000
race         47       3.3829787       1.1142007       1.0000000       4.0000000
schtyp       47       1.2127660       0.4136881       1.0000000       2.0000000
prog         47       2.1063830       0.7293250       1.0000000       3.0000000
read         47      52.1702128      10.6185219      31.0000000      73.0000000
write        47      49.5531915      10.1570462      31.0000000      67.0000000
math         47      53.4680851      10.5662528      35.0000000      75.0000000
science      47      53.4042553      10.2780043      34.0000000      74.0000000
socst        47      50.7872340      10.8826471      26.0000000      71.0000000
-------------------------------------------------------------------------------

female=0 ses=3

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           29     113.9310345      52.4934901      14.0000000     199.0000000
race         29       3.6551724       0.9364012       1.0000000       4.0000000
schtyp       29       1.1379310       0.3509312       1.0000000       2.0000000
prog         29       2.0000000       0.5345225       1.0000000       3.0000000
read         29      55.6896552      10.6035824      34.0000000      76.0000000
write        29      52.8620690      10.7760453      33.0000000      67.0000000
math         29      54.8620690       8.6177729      38.0000000      71.0000000
science      29      54.7241379      10.1906699      26.0000000      69.0000000
socst        29      57.7931034       9.5595103      31.0000000      71.0000000
-------------------------------------------------------------------------------

female=1 ses=1

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           32      72.3750000      51.6444107       1.0000000     173.0000000
race         32       3.0312500       1.1495967       1.0000000       4.0000000
schtyp       32       1.0625000       0.2459347       1.0000000       2.0000000
prog         32       1.9687500       0.7398507       1.0000000       3.0000000
read         32      47.7812500       9.5570760      28.0000000      68.0000000
write        32      52.5000000       9.2387682      35.0000000      65.0000000
math         32      49.9062500       9.7164954      39.0000000      72.0000000
science      32      46.7187500       9.0419609      29.0000000      63.0000000
socst        32      49.1875000      10.8700165      26.0000000      71.0000000
-------------------------------------------------------------------------------

female=1 ses=2

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           48     105.6666667      55.6372496       2.0000000     193.0000000
race         48       3.5833333       0.9415545       1.0000000       4.0000000
schtyp       48       1.1875000       0.3944428       1.0000000       2.0000000
prog         48       2.1250000       0.7329625       1.0000000       3.0000000
read         48      51.0000000       8.1632284      36.0000000      71.0000000
write        48      54.2500000       7.3296251      39.0000000      67.0000000
math         48      50.9791667       7.9157521      33.0000000      72.0000000
science      48      50.0416667       7.0889736      36.0000000      66.0000000
socst        48      53.2500000       8.9407030      31.0000000      71.0000000
-------------------------------------------------------------------------------

female=1 ses=3

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           29     105.6896552      53.7720742      26.0000000     198.0000000
race         29       3.6206897       0.8200084       1.0000000       4.0000000
schtyp       29       1.2413793       0.4354942       1.0000000       2.0000000
prog         29       1.9310345       0.5298945       1.0000000       3.0000000
read         29      57.3103448      11.2348420      36.0000000      76.0000000
write        29      58.9655172       6.7901334      36.0000000      67.0000000
math         29      57.4827586       8.7162438      42.0000000      71.0000000
science      29      56.1724138       9.5058965      31.0000000      69.0000000
socst        29      56.4827586      10.4765749      31.0000000      71.0000000
-------------------------------------------------------------------------------

The same options are used as above.

proc surveyselect data = "D:\hsb2" out = samp2 method = srs samprate = .5 seed = 9876;
strata female ses;
run;

proc sort data = samp2;
by female ses og;
run;

proc means data = samp2;
by female ses;
run;
female=0 ses=1

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                  8      74.0000000      43.0614179      16.0000000     134.0000000
race                8       3.2500000       1.1649647       1.0000000       4.0000000
schtyp              8       1.0000000               0       1.0000000       1.0000000
prog                8       1.6250000       0.9161254       1.0000000       3.0000000
read                8      49.2500000       7.5545634      42.0000000      63.0000000
write               8      42.8750000       6.3569422      31.0000000      52.0000000
math                8      45.6250000       6.2549980      39.0000000      59.0000000
science             8      47.3750000      10.1409706      34.0000000      65.0000000
socst               8      43.3750000      10.1409706      26.0000000      57.0000000
SelectionProb       8       0.5333333               0       0.5333333       0.5333333
SamplingWeight      8       1.8750000               0       1.8750000       1.8750000
-------------------------------------------------------------------------------------

female=0 ses=2

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 24     120.0000000      69.1626941       9.0000000     200.0000000
race               24       3.3750000       1.0959411       1.0000000       4.0000000
schtyp             24       1.2916667       0.4643056       1.0000000       2.0000000
prog               24       1.9166667       0.7172815       1.0000000       3.0000000
read               24      52.3750000       9.3799625      34.0000000      68.0000000
write              24      49.5833333       9.5549517      31.0000000      62.0000000
math               24      52.1666667      10.6185509      35.0000000      75.0000000
science            24      52.3333333       9.5492621      36.0000000      74.0000000
socst              24      51.4166667      11.2207752      26.0000000      71.0000000
SelectionProb      24       0.5106383               0       0.5106383       0.5106383
SamplingWeight     24       1.9583333               0       1.9583333       1.9583333
-------------------------------------------------------------------------------------

female=0 ses=3

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 15     112.4666667      58.7632862      15.0000000     199.0000000
race               15       3.5333333       1.0600988       1.0000000       4.0000000
schtyp             15       1.2000000       0.4140393       1.0000000       2.0000000
prog               15       2.0666667       0.4577377       1.0000000       3.0000000
read               15      56.5333333       9.8913141      39.0000000      76.0000000
write              15      54.1333333       9.4405407      38.0000000      67.0000000
math               15      55.2666667       7.4782224      39.0000000      64.0000000
science            15      54.6666667      10.8210553      26.0000000      66.0000000
socst              15      57.4666667       8.5345237      42.0000000      71.0000000
SelectionProb      15       0.5172414               0       0.5172414       0.5172414
SamplingWeight     15       1.9333333               0       1.9333333       1.9333333
-------------------------------------------------------------------------------------

female=1 ses=1

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 16      75.5000000      48.5166638       1.0000000     161.0000000
race               16       3.1250000       1.1474610       1.0000000       4.0000000
schtyp             16       1.0625000       0.2500000       1.0000000       2.0000000
prog               16       2.0625000       0.8539126       1.0000000       3.0000000
read               16      45.0000000       9.9866578      28.0000000      61.0000000
write              16      49.8125000       8.5496101      35.0000000      62.0000000
math               16      46.9375000       8.3224095      39.0000000      72.0000000
science            16      45.0000000       8.5634884      29.0000000      61.0000000
socst              16      47.7500000      11.9749739      26.0000000      66.0000000
SelectionProb      16       0.5000000               0       0.5000000       0.5000000
SamplingWeight     16       2.0000000               0       2.0000000       2.0000000
-------------------------------------------------------------------------------------

female=1 ses=2

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 24     121.9583333      57.7404079      13.0000000     193.0000000
race               24       3.7083333       0.7506036       1.0000000       4.0000000
schtyp             24       1.2916667       0.4643056       1.0000000       2.0000000
prog               24       2.2916667       0.7506036       1.0000000       3.0000000
read               24      49.7500000       6.6217691      36.0000000      65.0000000
write              24      53.6666667       7.2090925      41.0000000      67.0000000
math               24      49.2500000       7.2306714      37.0000000      63.0000000
science            24      49.8750000       6.4626855      39.0000000      61.0000000
socst              24      52.0833333       8.9632907      31.0000000      71.0000000
SelectionProb      24       0.5000000               0       0.5000000       0.5000000
SamplingWeight     24       2.0000000               0       2.0000000       2.0000000
-------------------------------------------------------------------------------------
female=1 ses=3

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 15     116.0000000      55.5504918      26.0000000     194.0000000
race               15       3.7333333       0.5936168       2.0000000       4.0000000
schtyp             15       1.2666667       0.4577377       1.0000000       2.0000000
prog               15       1.8666667       0.3518658       1.0000000       2.0000000
read               15      58.8000000       9.6599320      36.0000000      68.0000000
write              15      58.2000000       7.7015768      36.0000000      67.0000000
math               15      58.9333333       9.8522417      42.0000000      71.0000000
science            15      56.2000000       9.9871346      31.0000000      69.0000000
socst              15      57.5333333       8.8790819      39.0000000      71.0000000
SelectionProb      15       0.5172414               0       0.5172414       0.5172414
SamplingWeight     15       1.9333333               0       1.9333333       1.9333333
-------------------------------------------------------------------------------------

Example 3:  Using different sampling rates from each strata

In the examples above, we sampled from each strata at the same rate.  However, sometimes you want to sample more from one strata than another.  You can specify different sampling rates for each strata by enclosing the proportions in parentheses for the samprate option.  In our example, the values of female are 0 and 1.  We will sample 20% of the cases coded 0 (males) and 80% of the values coded 1 (females).  The variable ses is coded 1, 2 and 3.  We will sample 40% of the values coded 1, 10% of those coded 2 and 50% coded 3.

proc surveyselect data = "D:\hsb2" out = samp3 method = srs samprate = (.2 .8 .4 .1 .5) seed = 9876;
strata female ses;
run;

proc sort data = samp3;
by female ses;
run;

proc means data = samp3;
by female ses;
run;
female=0 ses=1

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
prog                8       1.7500000       0.8864053       1.0000000       3.0000000
id                  8      82.3750000      55.2085332      29.0000000     169.0000000
race                8       3.5000000       0.7559289       2.0000000       4.0000000
schtyp              8       1.0000000               0       1.0000000       1.0000000
read                8      48.3750000       9.8552017      36.0000000      63.0000000
write               8      47.7500000       7.8330800      37.0000000      59.0000000
math                8      50.1250000       8.1141411      41.0000000      63.0000000
science             8      51.3750000      14.0197565      31.0000000      69.0000000
socst               8      44.1250000       8.8064506      32.0000000      57.0000000
SelectionProb       8       0.5357143       0.0381802       0.5000000       0.5714286
SamplingWeight      8       1.8750000       0.1336306       1.7500000       2.0000000
-------------------------------------------------------------------------------------

female=0 ses=2

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
prog               24       2.1250000       0.7408867       1.0000000       3.0000000
id                 24     105.9583333      65.5853106       7.0000000     200.0000000
race               24       3.2916667       1.2328534       1.0000000       4.0000000
schtyp             24       1.1666667       0.3806935       1.0000000       2.0000000
read               24      54.6666667      11.7498073      34.0000000      73.0000000
write              24      50.2083333      10.4589432      31.0000000      67.0000000
math               24      55.4583333      11.4625256      35.0000000      75.0000000
science            24      55.6250000      11.0682173      39.0000000      74.0000000
socst              24      51.6250000      13.2133414      26.0000000      71.0000000
SelectionProb      24       0.5111111       0.0160514       0.5000000       0.5333333
SamplingWeight     24       1.9583333       0.0601929       1.8750000       2.0000000
-------------------------------------------------------------------------------------

female=0 ses=3

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
prog               15       2.0000000       0.5345225       1.0000000       3.0000000
id                 15     112.2000000      43.3362270      20.0000000     196.0000000
race               15       3.8000000       0.7745967       1.0000000       4.0000000
schtyp             15       1.0666667       0.2581989       1.0000000       2.0000000
read               15      57.6000000      10.3978020      34.0000000      76.0000000
write              15      55.8666667       8.5009803      38.0000000      65.0000000
math               15      55.7333333       7.7962140      39.0000000      68.0000000
science            15      55.8000000       8.0993827      39.0000000      66.0000000
socst              15      58.6666667       6.5100655      46.0000000      66.0000000
SelectionProb      15       0.5174603       0.0108985       0.5000000       0.5238095
SamplingWeight     15       1.9333333       0.0416125       1.9090909       2.0000000
-------------------------------------------------------------------------------------

female=1 ses=1

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
prog               17       1.9411765       0.7475450       1.0000000       3.0000000
id                 17      69.0588235      50.1715440       1.0000000     163.0000000
race               17       3.0000000       1.1180340       1.0000000       4.0000000
schtyp             17       1.1176471       0.3321056       1.0000000       2.0000000
read               17      47.8235294       8.4575062      34.0000000      63.0000000
write              17      51.0588235       9.3839663      35.0000000      65.0000000
math               17      50.8823529      11.4394184      39.0000000      72.0000000
science            17      46.7647059       9.1140648      29.0000000      61.0000000
socst              17      48.1764706      11.5338377      26.0000000      66.0000000
SelectionProb      17       0.5320261       0.0207433       0.5000000       0.5555556
SamplingWeight     17       1.8823529       0.0748774       1.8000000       2.0000000
-------------------------------------------------------------------------------------

female=1 ses=2

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
prog               24       2.1250000       0.7408867       1.0000000       3.0000000
id                 24     105.3750000      50.7644282      10.0000000     182.0000000
race               24       3.5833333       0.9743076       1.0000000       4.0000000
schtyp             24       1.0833333       0.2823299       1.0000000       2.0000000
read               24      53.0000000       7.6214799      39.0000000      68.0000000
write              24      55.2500000       6.5756468      41.0000000      65.0000000
math               24      50.7916667       5.8827616      39.0000000      61.0000000
science            24      51.5000000       6.8588248      39.0000000      63.0000000
socst              24      56.9166667       6.8646332      41.0000000      71.0000000
SelectionProb      24       0.5000000               0       0.5000000       0.5000000
SamplingWeight     24       2.0000000               0       2.0000000       2.0000000
-------------------------------------------------------------------------------------

female=1 ses=3

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
prog               16       1.9375000       0.5737305       1.0000000       3.0000000
id                 16      98.2500000      48.5516907      26.0000000     198.0000000
race               16       3.7500000       0.5773503       2.0000000       4.0000000
schtyp             16       1.1250000       0.3415650       1.0000000       2.0000000
read               16      56.3750000      11.1706461      42.0000000      76.0000000
write              16      57.5000000       8.3586283      36.0000000      67.0000000
math               16      56.2500000       8.2744587      42.0000000      67.0000000
science            16      55.2500000       9.3985815      31.0000000      69.0000000
socst              16      53.4375000      11.4015715      31.0000000      66.0000000
SelectionProb      16       0.5559524       0.0527261       0.5238095       0.6666667
SamplingWeight     16       1.8125000       0.1552938       1.5000000       1.9090909
-------------------------------------------------------------------------------------

Example 4:  Specifying the number of observations to be sampled

You can specify the number of observations to be sampled from each strata if you prefer.  Instead of using the samprate option, you would use the n = option and list the numbers in parentheses.

proc surveyselect data = "D:\hsb2" out = samp4 method = srs n = (15 20 10 30 25) seed = 9876;
strata female ses;
run;

proc sort data = samp4;
by female ses;
run;

proc means data = samp4;
by female ses;
run;
female=0 ses=1

Variable           N            Mean         Std Dev         Minimum         Maximum
------------------------------------------------------------------------------------
id                 3      50.0000000      52.0480547       5.0000000     107.0000000
race               3       2.6666667       1.5275252       1.0000000       4.0000000
schtyp             3       1.0000000               0       1.0000000       1.0000000
prog               3       2.3333333       0.5773503       2.0000000       3.0000000
read               3      46.3333333       1.1547005      45.0000000      47.0000000
write              3      45.3333333      10.1159939      39.0000000      57.0000000
math               3      46.6666667       3.5118846      43.0000000      50.0000000
science            3      39.3333333       7.3711148      31.0000000      45.0000000
socst              3      37.6666667      16.0727513      26.0000000      56.0000000
SelectionProb      3       0.2000000               0       0.2000000       0.2000000
SamplingWeight     3       5.0000000               0       5.0000000       5.0000000
------------------------------------------------------------------------------------

female=0 ses=2

Variable           N            Mean         Std Dev         Minimum         Maximum
------------------------------------------------------------------------------------
id                38     110.1578947      62.9673988       7.0000000     200.0000000
race              38       3.4736842       1.0063808       1.0000000       4.0000000
schtyp            38       1.2105263       0.4131550       1.0000000       2.0000000
prog              38       2.0263158       0.7528986       1.0000000       3.0000000
read              38      52.2368421      10.4866304      31.0000000      73.0000000
write             38      49.0789474      10.6273019      31.0000000      67.0000000
math              38      53.1842105      10.2821158      35.0000000      75.0000000
science           38      52.5789474      10.1439146      34.0000000      74.0000000
socst             38      49.6842105      11.0088546      26.0000000      71.0000000
SelectionProb     38       0.8085106               0       0.8085106       0.8085106
SamplingWeight    38       1.2368421               0       1.2368421       1.2368421
------------------------------------------------------------------------------------

female=0 ses=3

Variable           N            Mean         Std Dev         Minimum         Maximum
------------------------------------------------------------------------------------
id                12     106.2500000      42.7978865      49.0000000     199.0000000
race              12       3.9166667       0.2886751       3.0000000       4.0000000
schtyp            12       1.0833333       0.2886751       1.0000000       2.0000000
prog              12       1.8333333       0.5773503       1.0000000       3.0000000
read              12      59.3333333      10.6030299      44.0000000      76.0000000
write             12      54.1666667      10.7435675      33.0000000      65.0000000
math              12      56.8333333       9.6844142      39.0000000      71.0000000
science           12      59.3333333       5.6461303      49.0000000      66.0000000
socst             12      59.4166667      11.0409019      31.0000000      71.0000000
SelectionProb     12       0.4137931               0       0.4137931       0.4137931
SamplingWeight    12       2.4166667               0       2.4166667       2.4166667
------------------------------------------------------------------------------------

female=1 ses=1

Variable           N            Mean         Std Dev         Minimum         Maximum
------------------------------------------------------------------------------------
id                 4      42.0000000       7.3484692      36.0000000      52.0000000
race               4       3.0000000               0       3.0000000       3.0000000
schtyp             4       1.0000000               0       1.0000000       1.0000000
prog               4       2.0000000       0.8164966       1.0000000       3.0000000
read               4      45.5000000       3.8729833      41.0000000      50.0000000
write              4      44.7500000       5.3150729      37.0000000      49.0000000
math               4      45.0000000       5.5976185      40.0000000      53.0000000
science            4      42.2500000       7.7190241      35.0000000      53.0000000
socst              4      53.5000000       8.6602540      46.0000000      66.0000000
SelectionProb      4       0.1250000               0       0.1250000       0.1250000
SamplingWeight     4       8.0000000               0       8.0000000       8.0000000
------------------------------------------------------------------------------------

female=1 ses=2

Variable           N            Mean         Std Dev         Minimum         Maximum
------------------------------------------------------------------------------------
id                24      95.9583333      58.9033690       2.0000000     190.0000000
race              24       3.4166667       1.0179548       1.0000000       4.0000000
schtyp            24       1.2083333       0.4148511       1.0000000       2.0000000
prog              24       2.0000000       0.7223151       1.0000000       3.0000000
read              24      50.4583333       7.3896471      39.0000000      71.0000000
write             24      53.0416667       8.3742834      39.0000000      67.0000000
math              24      52.4166667       8.5867272      33.0000000      72.0000000
science           24      48.9583333       7.4511695      39.0000000      66.0000000
socst             24      51.4583333      10.0994153      31.0000000      71.0000000
SelectionProb     24       0.5000000               0       0.5000000       0.5000000
SamplingWeight    24       2.0000000               0       2.0000000       2.0000000
------------------------------------------------------------------------------------

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California