UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

SAS Textbook Examples
Practical Methods for Design and Analysis of Complex Surveys, Revised Edition
by Lehtonen and Pahkinen
Chapter 3: Further use of auxiliary information

Stratified simple random sampling
page 74 Table 3.3  Estimates from an optimally allocated stratified simple random sample (n = 8); the Province'91 population.
NOTE:  In this data set, the fpc changes with the strata.  This is different from the previous examples.
data page74;
  input id str clu wt ue91 lab91 fpc;
  cards;
  1 1 1 1.75 4123 33786 7
  2 1 2 1.75 666 6016 7
  3 1 4 1.75 760 5919 7
  4 1 6 1.75 457 3022 7
  5 2 21 6.25 61 573 25
  6 2 25 6.25 262 1737 25
  7 2 26 6.25 331 2543 25
  8 2 27 6.25 98 545 25
  ;
run;
data second74;
  input id str _RATE_;
  cards;
  1 1 0.57
  2 1 0.57
  3 1 0.57
  4 1 0.57
  5 2 0.16
  6 2 0.16
  7 2 0.16
  8 2 0.16
  ;
run;

proc surveymeans data = page74 r = second74 sum std;
  weight wt;
  strata str;
  cluster clu;
  var ue91;
run;
The SURVEYMEANS Procedure

            Data Summary

Number of Strata                   2
Number of Clusters                 8
Number of Observations             8
Sum of Weights                    32

               Statistics

Variable             Sum         Std Dev
----------------------------------------
ue91               15211     4285.724888
----------------------------------------
page 83 Table 3.6  Estimates from a one-stage CLU sample (n = 8); the Province'91 population.
data page83;
  input id str clu wt ue91 lab91;
  fpc = 32;
  cards;
  1 1 2 4 666 6016 
  2 1 2 4 528 3818 
  3 1 2 4 760 5919 
  4 1 2 4 187 1448 
  5 1 8 4 129 927 
  6 1 8 4 128 819 
  7 1 8 4 331 2543 
  8 1 8 4 568 4011 
  ;
run;
proc surveymeans data = page83 r = .25 sum std ;
  weight wt;
  strata str;
  cluster clu;
  var ue91 lab91;
run;
The SURVEYMEANS Procedure

            Data Summary

Number of Strata                   1
Number of Clusters                 2
Number of Observations             8
Sum of Weights                    32

               Statistics

Variable             Sum         Std Dev
----------------------------------------
ue91               13188     3412.140091
lab91             102004           30834
----------------------------------------
Two-stage cluster sampling
page 88 Table 3.8  Estimates from a two-stage CLU sample (n = 8); the Province'91 population.
data page88;
  input id str clu wt ue91 lab91 fpc1 fpc2 smplrat;
  cards;
  1 1 2 4 760 5919 8 4 .5
  2 1 2 4 187 1448 8 4 .5
  3 1 3 4 767 5823 8 4 .5
  4 1 3 4 142 675 8 4 .5
  5 1 4 4 94 831 8 4 .5
  6 1 4 4 98 545 8 4 .5
  7 1 7 4 262 1737 8 4 .5
  8 1 7 4 219 1330 8 4 .5
  ;
run;
proc surveymeans data = page88 r = .5 sum std;
  weight wt;
  cluster clu;
  strata str;
  var ue91 lab91;
run;
The SURVEYMEANS Procedure

            Data Summary

Number of Strata                   1
Number of Clusters                 4
Number of Observations             8
Sum of Weights                    32

               Statistics

Variable             Sum         Std Dev
----------------------------------------
ue91               10116     2045.755932
lab91              73232           16000
----------------------------------------
Post-stratified weights
page 97 Table 3.10  A simple random sample drawn without replacement from the Province'91 population with poststratum weights.
data page97;
  input id str clu wt ue91 lab91 poststr gwt postwt sruv srcvs ;
  fpc = 32;
  cards;
  1 1 1 4 4123 33786 1 .5833 2.333 .25 .43
  2 1 4 4 760 5919 1 .5833 2.333 .25 .43
  3 1 5 4 721 4930 1 .5833 2.333 .25 .43
  4 1 15 4 142 675 2 1.2500 5.0000 .25 .20
  5 1 18 4 187 1448 2 1.2500 5.0000 .25 .20
  6 1 26 4 331 2543 2 1.2500 5.0000 .25 .20
  7 1 30 4 127 1084 2 1.2500 5.0000 .25 .20
  8 1 31 4 219 1330 2 1.2500 5.0000 .25 .20
  ;
run;
data second97;
  input id str _RATE_ poststr _TOTAL_;
  cards;
  1 1 0.43 1 7
  2 1 0.43 1 7
  3 1 0.43 1 7
  4 2 0.20 1 25
  5 2 0.20 1 25
  6 2 0.20 1 25
  7 2 0.20 1 25
  8 2 0.20 1 25
  ;
run;
poststratified conditional estimates
This has been skipped for now.
poststratified unconditional estimates
This has been skipped for now.
pure design-based estimated under srs
proc surveymeans data = page97 r = .25 sum std;
  weight wt;
  cluster clu;
  strata str;
  var ue91 lab91;
run;
The SURVEYMEANS Procedure

            Data Summary

Number of Strata                   1
Number of Clusters                 8
Number of Observations             8
Sum of Weights                    32

               Statistics

Variable             Sum         Std Dev
----------------------------------------
ue91               26440           13282
lab91             206860          109763
----------------------------------------
The code below gives the numbers that are shown in the calculations on page 102.
data page102;
  input id str clu wt ue91 hou85 gwt adjwt smplrat;
  fpc = 32;
  cards;
  1 1 1 4 4123 26881 .5562 2.2248 .25
  2 1 4 4 760 4896 .5562 2.2248 .25
  3 1 5 4 721 3730 .5562 2.2248 .25
  4 1 15 4 142 556 .5562 2.2248 .25
  5 1 18 4 187 1463 .5562 2.2248 .25
  6 1 26 4 331 1946 .5562 2.2248 .25
  7 1 30 4 127 834 .5562 2.2248 .25
  8 1 31 4 219 932 .5562 2.2248 .25
  ;
run;
NOTE:  6610/41238 = .16028905, which is the correct answer.
proc surveymeans data = page102 r = .25 sum std;
  weight wt;
  cluster clu;
  strata str;
  var ue91 hou85;
run;
The SURVEYMEANS Procedure

            Data Summary

Number of Strata                   1
Number of Clusters                 8
Number of Observations             8
Sum of Weights                    32

               Statistics

Variable             Sum         Std Dev
----------------------------------------
ue91               26440           13282
hou85             164952           87299
----------------------------------------
The goal is to get the .1603 shown in the upper middle of page 102.  You need this ratio estimate so that you can multiply it by the population total of the auxiliary  variable to calculate the ratio estimate for the total of the variable of interest.
simple random sample without replacement for regression estimation
page 107 Table 3.14  Model-assisted estimation results for the population total of ue91 from an SRS sample of eight elements drawn from the Province'91 population.
data page106;
  input id str clu wt ue91 meanz hou85 diffhou85 smplrat;
  fpc = 32;
  cards;
  1 1 1 4 4123 2867 26881 -24014 .25
  2 1 4 4 760 2867 4896 -2029 .25
  3 1 5 4 721 2867 3730 -863 .25
  4 1 15 4 142 2867 556 2311 .25
  5 1 18 4 187 2867 1463 1404 .25
  6 1 26 4 331 2867 1946 921 .25
  7 1 30 4 127 2867 834 2033 .25
  8 1 31 4 219 2867 932 1935 .25
  ;
run;
strategy:  design-based estimator with srs
proc surveymeans data = page106 r = .25 sum std;
  weight wt;
  strata str;
  cluster clu;
  var ue91;
run;
The SURVEYMEANS Procedure

            Data Summary

Number of Strata                   1
Number of Clusters                 8
Number of Observations             8
Sum of Weights                    32

               Statistics

Variable             Sum         Std Dev
----------------------------------------
ue91               26440           13282
----------------------------------------
strategy:  poststratified estimator with srs*pos
This has been skipped for now.
strategy:  ratio estimator with srs*rat
This has been skipped for now.
proc surveymeans data = page106 r = .25 sum std;
  weight wt;
  strata str;
  cluster clu;
  var ue91 hou85;
run;
The SURVEYMEANS Procedure

            Data Summary

Number of Strata                   1
Number of Clusters                 8
Number of Observations             8
Sum of Weights                    32

               Statistics

Variable             Sum         Std Dev
----------------------------------------
ue91               26440           13282
hou85             164952           87299
----------------------------------------
strategy:  regression estimator with srs*reg
The code below produces the estimate of b-hat, 0.152, shown in the middle of page 106.  The use of the estimate statement gives the regression estimate of 15312 and the correct standard error of 648, as shown in Table 3.14 on page 107.
proc surveyreg data = page106 r = .25 ;
  weight wt;
  strata str;
  cluster clu;
  model ue91 = hou85;
  estimate "UE91 Total" Intercept 32 hou85 91753 / E;
run;
The SURVEYREG Procedure

Regression Analysis for Dependent Variable ue91

             Estimated Regression Coefficients

                             Standard
Parameter      Estimate         Error    t Value    Pr > |t|

Intercept    42.6546808    22.1860968       1.92      0.0960
hou85         0.1520142     0.0007745     196.29      <.0001

NOTE: The denominator degrees of freedom for the t tests is 7.

Coefficients of Estimate "UE91 Total"

Effect              Row 1

Intercept              32

hou85               91753

               Analysis of Estimable Functions

                              Standard
Parameter       Estimate         Error    t Value    Pr > |t|

UE91 Total    15312.7108    648.160289      23.62      <.0001

NOTE: The denominator degrees of freedom for the t tests is 7.

How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.