UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ 
How can I take a stratified random sample of my data?

Sometimes you may want to take a random sample of your data, but you want to respect the stratification that was used when the data set was created.  Other times you want to maintain certain proportions in the sampled data set; for example, drawing a sample from a data set, but having proportions of males and females that correspond to the current census figures.  To draw these types of samples from your data set, you can use proc surveyselect.  We will use the hsb2 data set for our examples. Notice that the examples here are for SAS 9.1x.

Example 1:  Taking a 50% sample from each strata using simple random sampling (srs)

Before we take our sample, let's look at the data set using proc means.  Because we will use a by statement, we need to sort the data first.  We will use the variable female as our stratification variable.  Also, we will use an options statement to suppress the showing of the variable labels in the output.

proc sort data = "D:\hsb2";
by female;
run;

options nolabel;
proc means data = "D:\hsb2";
by female;
run;

female=0

The MEANS Procedure

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           91     106.0109890      60.3122421       3.0000000     200.0000000
race         91       3.4285714       1.0867163       1.0000000       4.0000000
ses          91       2.1538462       0.6818777       1.0000000       3.0000000
schtyp       91       1.1538462       0.3628001       1.0000000       2.0000000
prog         91       2.0219780       0.6988566       1.0000000       3.0000000
read         91      52.8241758      10.5067105      31.0000000      76.0000000
write        91      50.1208791      10.3051607      31.0000000      67.0000000
math         91      52.9450549       9.6647845      35.0000000      75.0000000
science      91      53.2307692      10.7321707      26.0000000      74.0000000
socst        91      51.7912088      11.3338397      26.0000000      71.0000000
-------------------------------------------------------------------------------


female=1

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id          109      95.8990826      55.6275553       1.0000000     198.0000000
race        109       3.4311927       1.0033921       1.0000000       4.0000000
ses         109       1.9724771       0.7510328       1.0000000       3.0000000
schtyp      109       1.1651376       0.3730197       1.0000000       2.0000000
prog        109       2.0275229       0.6866278       1.0000000       3.0000000
read        109      51.7339450      10.0578348      28.0000000      76.0000000
write       109      54.9908257       8.1337152      35.0000000      67.0000000
math        109      52.3944954       9.1510153      33.0000000      72.0000000
science     109      50.6972477       9.0385026      29.0000000      69.0000000
socst       109      52.9174312      10.2344086      26.0000000      71.0000000
-------------------------------------------------------------------------------

In the command below we have used several options.  We have used the data = option to specify the data set from which we wish to draw the sample.  The method option indicates the method by which we would like the sample drawn.  SAS offers a wide range of options for this, including probability-proportional-to-size and systematic sampling.  The samprate option is used to specify the sampling rate.  Here, we have indicated .5, which means 50%.  We have used the seed option to set the seed so that our results will be replicable.  On the strata statement we specify the variable (or variables) that define the strata.

proc surveyselect data = "D:\hsb2" out = samp1 method = srs samprate = .5 seed = 9876;
strata female;
run;

proc sort data = samp1;
by female;
run;

proc means data = samp1;
by female;
run;
female=0

The MEANS Procedure

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 46      94.5869565      60.1141788       5.0000000     197.0000000
race               46       3.1956522       1.2582098       1.0000000       4.0000000
ses                46       2.1521739       0.6981688       1.0000000       3.0000000
schtyp             46       1.0652174       0.2496374       1.0000000       2.0000000
prog               46       2.2173913       0.7276459       1.0000000       3.0000000
read               46      50.6956522      10.5848310      31.0000000      73.0000000
write              46      47.4565217      10.2473986      31.0000000      65.0000000
math               46      53.0869565       9.5657400      38.0000000      75.0000000
science            46      51.3043478      11.7735477      26.0000000      74.0000000
socst              46      48.9565217      12.4185462      26.0000000      71.0000000
SelectionProb      46       0.5054945               0       0.5054945       0.5054945
SamplingWeight     46       1.9782609               0       1.9782609       1.9782609
-------------------------------------------------------------------------------------


female=1

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 55      82.2727273      54.4056964       1.0000000     194.0000000
race               55       3.2545455       1.1420933       1.0000000       4.0000000
ses                55       1.9636364       0.7444520       1.0000000       3.0000000
schtyp             55       1.1090909       0.3146266       1.0000000       2.0000000
prog               55       2.1636364       0.7139778       1.0000000       3.0000000
read               55      50.4545455      10.3705748      28.0000000      76.0000000
write              55      54.3636364       8.5729195      35.0000000      67.0000000
math               55      51.9818182       9.9712381      33.0000000      72.0000000
science            55      50.4727273      10.2791673      31.0000000      69.0000000
socst              55      52.3272727      10.2885311      31.0000000      71.0000000
SelectionProb      55       0.5045872               0       0.5045872       0.5045872
SamplingWeight     55       1.9818182               0       1.9818182       1.9818182
-------------------------------------------------------------------------------------

If you want to know which cases were not selected, or if you want to use the two samples for validation purposes, you have to merge the sampled data set back with the original data set.  An example is given below.  Note that we need to sort both the original data set and the sampled data set on the same variable.  This variable must uniquely identify each case in the data set.  You can tell which cases were selected into the sample because they have values for Selection Prob and Sampling Weight.  These variables were created by proc surveyselect, and hence are not in the original data file.  If you want to create three or more data sets from your original data set, you can use Enterprise Miner. 

proc sort data = "D:\hsb2";
by id;
run;

proc sort data = samp1;
by id;
run;

data merge1;
set "D:\hsb2" samp1;
by id;
run;

proc print data = merge1 (obs = 25);
run;
Obs  id  female  race  ses  schtyp  prog  read  write  math  science  socst     Prob     Weight

  1   1     1      1    1      1      3    34     44    40      39      41     .          .
  2   1     1      1    1      1      3    34     44    40      39      41    0.50459    1.98182
  3   2     1      1    2      1      3    39     41    33      42      41     .          .
  4   2     1      1    2      1      3    39     41    33      42      41    0.50459    1.98182
  5   3     0      1    1      1      2    63     65    48      63      56     .          .
  6   4     1      1    1      1      2    44     50    41      39      51     .          .
  7   4     1      1    1      1      2    44     50    41      39      51    0.50459    1.98182
  8   5     0      1    1      1      2    47     40    43      45      31     .          .
  9   5     0      1    1      1      2    47     40    43      45      31    0.50549    1.97826
 10   6     1      1    1      1      2    47     41    46      40      41     .          .
 11   6     1      1    1      1      2    47     41    46      40      41    0.50459    1.98182
 12   7     0      1    2      1      2    57     54    59      47      51     .          .
 13   7     0      1    2      1      2    57     54    59      47      51    0.50549    1.97826
 14   8     1      1    1      1      2    39     44    52      44      48     .          .
 15   9     0      1    2      1      3    48     49    52      44      51     .          .
 16  10     1      1    2      1      1    47     54    49      53      61     .          .
 17  10     1      1    2      1      1    47     54    49      53      61    0.50459    1.98182
 18  11     0      1    2      1      2    34     46    45      39      36     .          .
 19  11     0      1    2      1      2    34     46    45      39      36    0.50549    1.97826
 20  12     0      1    2      1      3    37     44    45      39      46     .          .
 21  13     1      1    2      1      3    47     46    39      47      61     .          .
 22  13     1      1    2      1      3    47     46    39      47      61    0.50459    1.98182
 23  14     0      1    3      1      2    47     41    54      42      56     .          .
 24  14     0      1    3      1      2    47     41    54      42      56    0.50549    1.97826
 25  15     0      1    3      1      3    39     39    44      26      42     .          .

Example 2:  Using more than one strata variable

In this example, we will use three strata variables.  The variable female has two values, and the variable ses has three levels.  As before, we will sort the original data set on the strata variables, and then we will do a proc means to see what the variables look like before we draw the sample. 

proc sort data = "D:\hsb2";
by female ses;
run;

proc means data = "D:\hsb2";
by female ses;
run;
female=0 ses=1

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           15      79.2000000      57.7262010       3.0000000     169.0000000
race         15       3.1333333       1.2459458       1.0000000       4.0000000
schtyp       15       1.0000000               0       1.0000000       1.0000000
prog         15       1.8000000       0.8618916       1.0000000       3.0000000
read         15      49.3333333       9.0999738      36.0000000      63.0000000
write        15      46.6000000       9.0301084      31.0000000      65.0000000
math         15      47.6000000       6.7802233      39.0000000      63.0000000
science      15      49.8000000      12.9735996      31.0000000      69.0000000
socst        15      43.3333333       9.9618319      26.0000000      57.0000000
-------------------------------------------------------------------------------

female=0 ses=2

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           47     109.6808511      64.3256343       7.0000000     200.0000000
race         47       3.3829787       1.1142007       1.0000000       4.0000000
schtyp       47       1.2127660       0.4136881       1.0000000       2.0000000
prog         47       2.1063830       0.7293250       1.0000000       3.0000000
read         47      52.1702128      10.6185219      31.0000000      73.0000000
write        47      49.5531915      10.1570462      31.0000000      67.0000000
math         47      53.4680851      10.5662528      35.0000000      75.0000000
science      47      53.4042553      10.2780043      34.0000000      74.0000000
socst        47      50.7872340      10.8826471      26.0000000      71.0000000
-------------------------------------------------------------------------------

female=0 ses=3

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           29     113.9310345      52.4934901      14.0000000     199.0000000
race         29       3.6551724       0.9364012       1.0000000       4.0000000
schtyp       29       1.1379310       0.3509312       1.0000000       2.0000000
prog         29       2.0000000       0.5345225       1.0000000       3.0000000
read         29      55.6896552      10.6035824      34.0000000      76.0000000
write        29      52.8620690      10.7760453      33.0000000      67.0000000
math         29      54.8620690       8.6177729      38.0000000      71.0000000
science      29      54.7241379      10.1906699      26.0000000      69.0000000
socst        29      57.7931034       9.5595103      31.0000000      71.0000000
-------------------------------------------------------------------------------

female=1 ses=1

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           32      72.3750000      51.6444107       1.0000000     173.0000000
race         32       3.0312500       1.1495967       1.0000000       4.0000000
schtyp       32       1.0625000       0.2459347       1.0000000       2.0000000
prog         32       1.9687500       0.7398507       1.0000000       3.0000000
read         32      47.7812500       9.5570760      28.0000000      68.0000000
write        32      52.5000000       9.2387682      35.0000000      65.0000000
math         32      49.9062500       9.7164954      39.0000000      72.0000000
science      32      46.7187500       9.0419609      29.0000000      63.0000000
socst        32      49.1875000      10.8700165      26.0000000      71.0000000
-------------------------------------------------------------------------------

female=1 ses=2

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           48     105.6666667      55.6372496       2.0000000     193.0000000
race         48       3.5833333       0.9415545       1.0000000       4.0000000
schtyp       48       1.1875000       0.3944428       1.0000000       2.0000000
prog         48       2.1250000       0.7329625       1.0000000       3.0000000
read         48      51.0000000       8.1632284      36.0000000      71.0000000
write        48      54.2500000       7.3296251      39.0000000      67.0000000
math         48      50.9791667       7.9157521      33.0000000      72.0000000
science      48      50.0416667       7.0889736      36.0000000      66.0000000
socst        48      53.2500000       8.9407030      31.0000000      71.0000000
-------------------------------------------------------------------------------

female=1 ses=3

Variable      N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------
id           29     105.6896552      53.7720742      26.0000000     198.0000000
race         29       3.6206897       0.8200084       1.0000000       4.0000000
schtyp       29       1.2413793       0.4354942       1.0000000       2.0000000
prog         29       1.9310345       0.5298945       1.0000000       3.0000000
read         29      57.3103448      11.2348420      36.0000000      76.0000000
write        29      58.9655172       6.7901334      36.0000000      67.0000000
math         29      57.4827586       8.7162438      42.0000000      71.0000000
science      29      56.1724138       9.5058965      31.0000000      69.0000000
socst        29      56.4827586      10.4765749      31.0000000      71.0000000
-------------------------------------------------------------------------------

The same options are used as above.

proc surveyselect data = "D:\hsb2" out = samp2 method = srs samprate = .5 seed = 9876;
strata female ses;
run;

proc sort data = samp2;
by female ses;
run;

proc means data = samp2;
by female ses;
run;
female=0 ses=1

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                  8      74.0000000      43.0614179      16.0000000     134.0000000
race                8       3.2500000       1.1649647       1.0000000       4.0000000
schtyp              8       1.0000000               0       1.0000000       1.0000000
prog                8       1.6250000       0.9161254       1.0000000       3.0000000
read                8      49.2500000       7.5545634      42.0000000      63.0000000
write               8      42.8750000       6.3569422      31.0000000      52.0000000
math                8      45.6250000       6.2549980      39.0000000      59.0000000
science             8      47.3750000      10.1409706      34.0000000      65.0000000
socst               8      43.3750000      10.1409706      26.0000000      57.0000000
SelectionProb       8       0.5333333               0       0.5333333       0.5333333
SamplingWeight      8       1.8750000               0       1.8750000       1.8750000
-------------------------------------------------------------------------------------

female=0 ses=2

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 24     120.0000000      69.1626941       9.0000000     200.0000000
race               24       3.3750000       1.0959411       1.0000000       4.0000000
schtyp             24       1.2916667       0.4643056       1.0000000       2.0000000
prog               24       1.9166667       0.7172815       1.0000000       3.0000000
read               24      52.3750000       9.3799625      34.0000000      68.0000000
write              24      49.5833333       9.5549517      31.0000000      62.0000000
math               24      52.1666667      10.6185509      35.0000000      75.0000000
science            24      52.3333333       9.5492621      36.0000000      74.0000000
socst              24      51.4166667      11.2207752      26.0000000      71.0000000
SelectionProb      24       0.5106383               0       0.5106383       0.5106383
SamplingWeight     24       1.9583333               0       1.9583333       1.9583333
-------------------------------------------------------------------------------------

female=0 ses=3

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 15     112.4666667      58.7632862      15.0000000     199.0000000
race               15       3.5333333       1.0600988       1.0000000       4.0000000
schtyp             15       1.2000000       0.4140393       1.0000000       2.0000000
prog               15       2.0666667       0.4577377       1.0000000       3.0000000
read               15      56.5333333       9.8913141      39.0000000      76.0000000
write              15      54.1333333       9.4405407      38.0000000      67.0000000
math               15      55.2666667       7.4782224      39.0000000      64.0000000
science            15      54.6666667      10.8210553      26.0000000      66.0000000
socst              15      57.4666667       8.5345237      42.0000000      71.0000000
SelectionProb      15       0.5172414               0       0.5172414       0.5172414
SamplingWeight     15       1.9333333               0       1.9333333       1.9333333
-------------------------------------------------------------------------------------

female=1 ses=1

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 16      75.5000000      48.5166638       1.0000000     161.0000000
race               16       3.1250000       1.1474610       1.0000000       4.0000000
schtyp             16       1.0625000       0.2500000       1.0000000       2.0000000
prog               16       2.0625000       0.8539126       1.0000000       3.0000000
read               16      45.0000000       9.9866578      28.0000000      61.0000000
write              16      49.8125000       8.5496101      35.0000000      62.0000000
math               16      46.9375000       8.3224095      39.0000000      72.0000000
science            16      45.0000000       8.5634884      29.0000000      61.0000000
socst              16      47.7500000      11.9749739      26.0000000      66.0000000
SelectionProb      16       0.5000000               0       0.5000000       0.5000000
SamplingWeight     16       2.0000000               0       2.0000000       2.0000000
-------------------------------------------------------------------------------------

female=1 ses=2

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 24     121.9583333      57.7404079      13.0000000     193.0000000
race               24       3.7083333       0.7506036       1.0000000       4.0000000
schtyp             24       1.2916667       0.4643056       1.0000000       2.0000000
prog               24       2.2916667       0.7506036       1.0000000       3.0000000
read               24      49.7500000       6.6217691      36.0000000      65.0000000
write              24      53.6666667       7.2090925      41.0000000      67.0000000
math               24      49.2500000       7.2306714      37.0000000      63.0000000
science            24      49.8750000       6.4626855      39.0000000      61.0000000
socst              24      52.0833333       8.9632907      31.0000000      71.0000000
SelectionProb      24       0.5000000               0       0.5000000       0.5000000
SamplingWeight     24       2.0000000               0       2.0000000       2.0000000
-------------------------------------------------------------------------------------
female=1 ses=3

Variable            N            Mean         Std Dev         Minimum         Maximum
-------------------------------------------------------------------------------------
id                 15     116.0000000      55.5504918      26.0000000     194.0000000
race               15       3.7333333       0.5936168       2.0000000       4.0000000
schtyp             15       1.2666667       0.4577377       1.0000000       2.0000000
prog               15       1.8666667       0.3518658       1.0000000       2.0000000
read               15      58.8000000       9.6599320      36.0000000      68.0000000
write              15      58.2000000       7.7015768      36.0000000      67.0000000
math               15      58.9333333       9.8522417      42.0000000      71.0000000
science            15      56.2000000       9.9871346      31.0000000      69.0000000
socst              15      57.5333333       8.8790819      39.0000000      71.0000000
SelectionProb      15       0.5172414               0       0.5172414       0.5172414
SamplingWeight     15       1.9333333               0       1.9333333       1.9333333
-------------------------------------------------------------------------------------

Example 3:  Using different sampling rates from each of the strata

In the examples above, we sampled from each strata at the same rate.  However, sometimes you want to sample more from one strata than another.  You can specify different sampling rates for each strata by enclosing the proportions in parentheses for the samprate option.  Let's first take a look at the cell counts for the strata variables female and ses.

proc freq data = "d:\hsb2";
table female*ses /nopercent norow nocol;
run;
Table of FEMALE by SES

FEMALE     SES

Frequency|       1|       2|       3|  Total
---------+--------+--------+--------+
       0 |     15 |     47 |     29 |     91
---------+--------+--------+--------+
       1 |     32 |     48 |     29 |    109
---------+--------+--------+--------+
Total          47       95       58      200

The table below gives sampling rates we will use for each of the cells above.

  ses=1 ses=2 ses=3
female=0 .70 .50 .70
female=1 .70 .50 .70
proc sort data = "D:\hsb2";
by female ses ;
run;
proc surveyselect data = "D:\hsb2" out = samp3 method = srs 
                  samprate = (.7 .5 .7 .7 .5 .7) seed = 9876;
strata female ses;
run;

proc sort data = samp3;
by female ses;
run;

proc means data = samp3;
by female ses;
run;

female=0 ses=1

The MEANS Procedure

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                11     65.0000000     57.5742998      3.0000000    169.0000000
race              11      2.8181818      1.3280197      1.0000000      4.0000000
schtyp            11      1.0000000              0      1.0000000      1.0000000
prog              11      1.5454545      0.6875517      1.0000000      3.0000000
read              11     52.3636364      8.3339394     42.0000000     63.0000000
write             11     48.1818182      9.7551851     31.0000000     65.0000000
math              11     48.3636364      7.0038950     41.0000000     63.0000000
science           11     52.4545455     12.4929071     31.0000000     69.0000000
socst             11     46.3636364      8.9361371     31.0000000     57.0000000
SelectionProb     11      0.7333333              0      0.7333333      0.7333333
SamplingWeight    11      1.3636364              0      1.3636364      1.3636364
--------------------------------------------------------------------------------

female=0 ses=2

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                24    100.2500000     66.1107437      7.0000000    195.0000000
race              24      3.2916667      1.1601786      1.0000000      4.0000000
schtyp            24      1.2083333      0.4148511      1.0000000      2.0000000
prog              24      2.0000000      0.7801895      1.0000000      3.0000000
read              24     50.6250000      8.5201169     34.0000000     63.0000000
write             24     48.5833333      9.8241790     31.0000000     65.0000000
math              24     51.0416667      8.3690900     35.0000000     66.0000000
science           24     50.7083333      8.3222027     36.0000000     66.0000000
socst             24     48.7083333     10.4235485     26.0000000     66.0000000
SelectionProb     24      0.5106383              0      0.5106383      0.5106383
SamplingWeight    24      1.9583333              0      1.9583333      1.9583333
--------------------------------------------------------------------------------

female=0 ses=3

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                21    125.9523810     49.3786150     20.0000000    199.0000000
race              21      3.8095238      0.6796358      1.0000000      4.0000000
schtyp            21      1.1904762      0.4023739      1.0000000      2.0000000
prog              21      1.9523810      0.4976134      1.0000000      3.0000000
read              21     55.0476190      8.6340963     39.0000000     73.0000000
write             21     52.1428571     10.8778937     33.0000000     67.0000000
math              21     54.5714286      7.8394606     38.0000000     71.0000000
science           21     55.6190476      8.4289750     36.0000000     69.0000000
socst             21     57.2380952     10.1828521     31.0000000     71.0000000
SelectionProb     21      0.7241379              0      0.7241379      0.7241379
SamplingWeight    21      1.3809524              0      1.3809524      1.3809524
--------------------------------------------------------------------------------

female=1 ses=1

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                23     77.5652174     52.5537180      1.0000000    173.0000000
race              23      3.1739130      1.0724727      1.0000000      4.0000000
schtyp            23      1.0869565      0.2881041      1.0000000      2.0000000
prog              23      2.1304348      0.7570486      1.0000000      3.0000000
read              23     48.9130435      9.0599671     34.0000000     65.0000000
write             23     53.7391304      9.7057499     35.0000000     65.0000000
math              23     50.9565217     10.7765666     40.0000000     72.0000000
science           23     47.4782609      9.4670220     29.0000000     63.0000000
socst             23     49.2608696     11.7055441     26.0000000     71.0000000
SelectionProb     23      0.7187500              0      0.7187500      0.7187500
SamplingWeight    23      1.3913043              0      1.3913043      1.3913043
--------------------------------------------------------------------------------

female=1 ses=2

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                24    108.2083333     53.9234394     13.0000000    193.0000000
race              24      3.7083333      0.8064504      1.0000000      4.0000000
schtyp            24      1.2083333      0.4148511      1.0000000      2.0000000
prog              24      2.0833333      0.7172815      1.0000000      3.0000000
read              24     51.0833333      7.1439647     42.0000000     71.0000000
write             24     55.1666667      7.2751314     41.0000000     67.0000000
math              24     51.8333333      8.8251945     38.0000000     72.0000000
science           24     49.0416667      7.4043417     36.0000000     66.0000000
socst             24     53.5000000      9.2077472     31.0000000     71.0000000
SelectionProb     24      0.5000000              0      0.5000000      0.5000000
SamplingWeight    24      2.0000000              0      2.0000000      2.0000000
--------------------------------------------------------------------------------

female=1 ses=3

The MEANS Procedure

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                21    109.5238095     60.1361946     26.0000000    198.0000000
race              21      3.5238095      0.9283883      1.0000000      4.0000000
schtyp            21      1.3333333      0.4830459      1.0000000      2.0000000
prog              21      1.9047619      0.4364358      1.0000000      3.0000000
read              21     57.8571429     12.0842282     36.0000000     76.0000000
write             21     60.7619048      4.7634521     52.0000000     67.0000000
math              21     58.5238095      9.0808537     42.0000000     71.0000000
science           21     57.0952381      8.5726586     34.0000000     69.0000000
socst             21     57.4761905      9.9880881     31.0000000     71.0000000
SelectionProb     21      0.7241379              0      0.7241379      0.7241379
SamplingWeight    21      1.3809524              0      1.3809524      1.3809524
--------------------------------------------------------------------------------

Example 4:  Specifying the number of observations to be sampled

You can specify the number of observations to be sampled from each strata if you prefer.  Instead of using the samprate option, you would use the n = option and list the numbers in parentheses.

proc sort data = "D:\hsb2";
by female ses ;
run;
proc surveyselect data = "D:\hsb2" out = samp4 method = srs 
                   n = (11 24 21 23 24 21) seed = 9876;
strata female ses;
run;

proc sort data = samp4;
by female ses;
run;

proc means data = samp4;
by female ses;
run;

female=0 ses=1

The MEANS Procedure

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                11     65.0000000     57.5742998      3.0000000    169.0000000
race              11      2.8181818      1.3280197      1.0000000      4.0000000
schtyp            11      1.0000000              0      1.0000000      1.0000000
prog              11      1.5454545      0.6875517      1.0000000      3.0000000
read              11     52.3636364      8.3339394     42.0000000     63.0000000
write             11     48.1818182      9.7551851     31.0000000     65.0000000
math              11     48.3636364      7.0038950     41.0000000     63.0000000
science           11     52.4545455     12.4929071     31.0000000     69.0000000
socst             11     46.3636364      8.9361371     31.0000000     57.0000000
SelectionProb     11      0.7333333              0      0.7333333      0.7333333
SamplingWeight    11      1.3636364              0      1.3636364      1.3636364
--------------------------------------------------------------------------------

female=0 ses=2

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                24    100.2500000     66.1107437      7.0000000    195.0000000
race              24      3.2916667      1.1601786      1.0000000      4.0000000
schtyp            24      1.2083333      0.4148511      1.0000000      2.0000000
prog              24      2.0000000      0.7801895      1.0000000      3.0000000
read              24     50.6250000      8.5201169     34.0000000     63.0000000
write             24     48.5833333      9.8241790     31.0000000     65.0000000
math              24     51.0416667      8.3690900     35.0000000     66.0000000
science           24     50.7083333      8.3222027     36.0000000     66.0000000
socst             24     48.7083333     10.4235485     26.0000000     66.0000000
SelectionProb     24      0.5106383              0      0.5106383      0.5106383
SamplingWeight    24      1.9583333              0      1.9583333      1.9583333
--------------------------------------------------------------------------------

female=0 ses=3

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                21    125.9523810     49.3786150     20.0000000    199.0000000
race              21      3.8095238      0.6796358      1.0000000      4.0000000
schtyp            21      1.1904762      0.4023739      1.0000000      2.0000000
prog              21      1.9523810      0.4976134      1.0000000      3.0000000
read              21     55.0476190      8.6340963     39.0000000     73.0000000
write             21     52.1428571     10.8778937     33.0000000     67.0000000
math              21     54.5714286      7.8394606     38.0000000     71.0000000
science           21     55.6190476      8.4289750     36.0000000     69.0000000
socst             21     57.2380952     10.1828521     31.0000000     71.0000000
SelectionProb     21      0.7241379              0      0.7241379      0.7241379
SamplingWeight    21      1.3809524              0      1.3809524      1.3809524
--------------------------------------------------------------------------------

female=1 ses=1

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                23     77.5652174     52.5537180      1.0000000    173.0000000
race              23      3.1739130      1.0724727      1.0000000      4.0000000
schtyp            23      1.0869565      0.2881041      1.0000000      2.0000000
prog              23      2.1304348      0.7570486      1.0000000      3.0000000
read              23     48.9130435      9.0599671     34.0000000     65.0000000
write             23     53.7391304      9.7057499     35.0000000     65.0000000
math              23     50.9565217     10.7765666     40.0000000     72.0000000
science           23     47.4782609      9.4670220     29.0000000     63.0000000
socst             23     49.2608696     11.7055441     26.0000000     71.0000000
SelectionProb     23      0.7187500              0      0.7187500      0.7187500
SamplingWeight    23      1.3913043              0      1.3913043      1.3913043
--------------------------------------------------------------------------------

female=1 ses=2

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                24    108.2083333     53.9234394     13.0000000    193.0000000
race              24      3.7083333      0.8064504      1.0000000      4.0000000
schtyp            24      1.2083333      0.4148511      1.0000000      2.0000000
prog              24      2.0833333      0.7172815      1.0000000      3.0000000
read              24     51.0833333      7.1439647     42.0000000     71.0000000
write             24     55.1666667      7.2751314     41.0000000     67.0000000
math              24     51.8333333      8.8251945     38.0000000     72.0000000
science           24     49.0416667      7.4043417     36.0000000     66.0000000
socst             24     53.5000000      9.2077472     31.0000000     71.0000000
SelectionProb     24      0.5000000              0      0.5000000      0.5000000
SamplingWeight    24      2.0000000              0      2.0000000      2.0000000
--------------------------------------------------------------------------------

female=1 ses=3

The MEANS Procedure

Variable           N           Mean        Std Dev        Minimum        Maximum
--------------------------------------------------------------------------------
id                21    109.5238095     60.1361946     26.0000000    198.0000000
race              21      3.5238095      0.9283883      1.0000000      4.0000000
schtyp            21      1.3333333      0.4830459      1.0000000      2.0000000
prog              21      1.9047619      0.4364358      1.0000000      3.0000000
read              21     57.8571429     12.0842282     36.0000000     76.0000000
write             21     60.7619048      4.7634521     52.0000000     67.0000000
math              21     58.5238095      9.0808537     42.0000000     71.0000000
science           21     57.0952381      8.5726586     34.0000000     69.0000000
socst             21     57.4761905      9.9880881     31.0000000     71.0000000
SelectionProb     21      0.7241379              0      0.7241379      0.7241379
SamplingWeight    21      1.3809524              0      1.3809524      1.3809524
-------------------------------------------------------------------------------

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California