UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Textbook Examples
Practical Methods for Design and Analysis of Complex Surveys, Second Edition
by Lehtonen and Pahkinen
Chapter 3:  Further use of auxiliary information

The examples below use Stata 9.  If you are using Stata versions 7 or 8, please see this page.

NOTE:  If you want to see the design effect or the misspecification effect, use estat effects after the command.

Stratified simple random sampling

page 74 table 3.3  Estimates from an optimally allocated stratified simple random sample (n = 8); the Province'91 population. 

NOTE:  In this data set, the fpc changes with the strata.  This is different from all of the previous examples.
input id str clu wt ue91 lab91 fpc
1 1 1 1.75 4123 33786 7
2 1 2 1.75 666 6016 7
3 1 4 1.75 760 5919 7
4 1 6 1.75 457 3022 7
5 2 21 6.25 61 573 25
6 2 25 6.25 262 1737 25
7 2 26 6.25 331 2543 25
8 2 27 6.25 98 545 25
end
svyset clu [pweight=wt], fpc(fpc) strata(str)

      pweight: wt
          VCE: linearized
     Strata 1: str
         SU 1: clu
        FPC 1: fpc

svy: total ue91
(running total on estimation sample)

Survey: Total estimation

Number of strata =       2          Number of obs    =       8
Number of PSUs   =       8          Population size  =      32
                                    Design df        =       6

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        ue91 |    15210.5   4279.452      4739.059    25681.94
--------------------------------------------------------------

estat effects

----------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.       Deff      Deft
-------------+--------------------------------------------
        ue91 |    15210.5   4279.452      .20649   .393532
----------------------------------------------------------
Note: Weights must represent population totals for deff to be correct when using an FPC; however, deft is
      invariant to the scale of weights.

svy: ratio ue91 lab91
(running ratio on estimation sample)

Survey: Ratio estimation

Number of strata =       2          Number of obs    =       8
Number of PSUs   =       8          Population size  =      32
                                    Design df        =       6

     _ratio_1: ue91/lab91

--------------------------------------------------------------
             |             Linearized
             |      Ratio   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    _ratio_1 |   .1277788   .0031736      .1200134    .1355442
--------------------------------------------------------------

estat effects

     _ratio_1: ue91/lab91

----------------------------------------------------------
             |             Linearized
             |      Ratio   Std. Err.       Deff      Deft
-------------+--------------------------------------------
    _ratio_1 |   .1277788   .0031736     .380341   .534093
----------------------------------------------------------
Note: Weights must represent population totals for deff to be correct when using an FPC; however, deft is
      invariant to the scale of weights.

page 83 table 3.6  Estimates from a one-stage CLU sample (n = 8); the Province'91 population.

input id str clu wt ue91 lab91 
1 1 2 4 666 6016 
2 1 2 4 528 3818 
3 1 2 4 760 5919 
4 1 2 4 187 1448 
5 1 8 4 129 927 
6 1 8 4 128 819 
7 1 8 4 331 2543 
8 1 8 4 568 4011 
end
gen fpc = 32
svyset clu [pweight=wt], strata(str)

      pweight: wt
          VCE: linearized
     Strata 1: str
         SU 1: clu
        FPC 1: <zero>

svy: total ue91
(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       2          Population size  =      32
                                    Design df        =       1

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        ue91 |      13188       3940     -36874.45    63250.45
--------------------------------------------------------------

svy: ratio ue91 lab91
(running ratio on estimation sample)

Survey: Ratio estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       2          Population size  =      32
                                    Design df        =       1

     _ratio_1: ue91/lab91

--------------------------------------------------------------
             |             Linearized
             |      Ratio   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    _ratio_1 |    .129289   .0065018      .0466761     .211902
--------------------------------------------------------------
Two-stage cluster sampling
page 88 table 3.8  Estimates from a two-stage CLU sample (n = 8); the Province'91 population.
input id str clu wt ue91 lab91 fpc1 fpc2 
1 1 2 4 760 5919 8 4 
2 1 2 4 187 1448 8 4 
3 1 3 4 767 5823 8 4 
4 1 3 4 142 675 8 4 
5 1 4 4 94 831 8 4 
6 1 4 4 98 545 8 4 
7 1 7 4 262 1737 8 4 
8 1 7 4 219 1330 8 4 
end
svyset clu [pweight=wt], fpc(fpc2) strata(str)

      pweight: wt
          VCE: linearized
     Strata 1: str
         SU 1: clu
        FPC 1: fpc2

svy: total ue91
(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       4          Population size  =      32
                                    Design df        =       3

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        ue91 |      10116          0             .           .
--------------------------------------------------------------
Note: Zero standard error due to 100% sampling rate detected for FPC in the first stage.

svy: ratio ue91 lab91
(running ratio on estimation sample)

Survey: Ratio estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       4          Population size  =      32
                                    Design df        =       3

     _ratio_1: ue91/lab91

--------------------------------------------------------------
             |             Linearized
             |      Ratio   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    _ratio_1 |   .1381363          0             .           .
--------------------------------------------------------------
Note: Zero standard error due to 100% sampling rate detected for FPC in the first stage.

Post-stratified weights

page 97 Table 3.10  A simple random sample drawn without replacement from the Province'91 population with poststratum weights.
input id str clu wt ue91 lab91 poststr gwt postwt sruv srcvs
  1 1 1 4 4123 33786 1 .5833 2.333 .25 .43
  2 1 4 4 760 5919 1 .5833 2.333 .25 .43
  3 1 5 4 721 4930 1 .5833 2.333 .25 .43
  4 1 15 4 142 675 2 1.2500 5.0000 .25 .20
  5 1 18 4 187 1448 2 1.2500 5.0000 .25 .20
  6 1 26 4 331 2543 2 1.2500 5.0000 .25 .20
  7 1 30 4 127 1084 2 1.2500 5.0000 .25 .20
  8 1 31 4 219 1330 2 1.2500 5.0000 .25 .20
end
poststratified conditional estimates

Note that you cannot get the deff with the postvar/postwgt statements.  The numbers on the postwgt statement must be integers (i.e., whole numbers) and are the population totals.
gen fpc = 32

gen postw = .
(8 missing values generated)

replace postw = 7 if poststr == 1
(3 real changes made)

replace postw = 25 if poststr == 2
(5 real changes made)

svyset [pw=wt], fpc(fpc) poststrata(poststr) postweight(postw)

      pweight: wt
          VCE: linearized
   Poststrata: poststr
   Postweight: postw
     Strata 1: <one>
         SU 1: <observations>
        FPC 1: fpc

svy: total ue91
(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       8          Population size  =      32
N. of poststrata =       2          Design df        =       7

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        ue91 |      18106   6013.646      3885.986    32326.01
--------------------------------------------------------------

svy: ratio ue91/lab91
(running ratio on estimation sample)

Survey: Ratio estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       8          Population size  =      32
N. of poststrata =       2          Design df        =       7

     _ratio_1: ue91/lab91

--------------------------------------------------------------
             |             Linearized
             |      Ratio   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    _ratio_1 |   .1297472    .004386       .119376    .1401184
--------------------------------------------------------------

poststratified unconditional estimates
This has been skipped for now.

pure design-based estimated under srs

svyset [pw=wt], fpc(fpc)

      pweight: wt
          VCE: linearized
     Strata 1: <one>
         SU 1: <observations>
        FPC 1: fpc

svy: total ue91
(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       8          Population size  =      32
                                    Design df        =       7

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        ue91 |      26440   13282.26     -4967.551    57847.55
--------------------------------------------------------------

svy: ratio ue91/lab91
(running ratio on estimation sample)

Survey: Ratio estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       8          Population size  =      32
                                    Design df        =       7

     _ratio_1: ue91/lab91

--------------------------------------------------------------
             |             Linearized
             |      Ratio   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    _ratio_1 |   .1278159   .0040873      .1181511    .1374808
--------------------------------------------------------------
page 102 Table 3.12  A simple random sample drawn without replacement from the Province'91 population prepared for ratio estimation.
input id str clu wt ue91 hou85 gwt adjwt smplrat
1 1 1 4 4123 26881 .5562 2.2248 .25
2 1 4 4 760 4896 .5562 2.2248 .25
3 1 5 4 721 3730 .5562 2.2248 .25
4 1 15 4 142 556 .5562 2.2248 .25
5 1 18 4 187 1463 .5562 2.2248 .25
6 1 26 4 331 1946 .5562 2.2248 .25
7 1 30 4 127 834 .5562 2.2248 .25
8 1 31 4 219 932 .5562 2.2248 .25
end

gen fpc = 32
svyset clu [pweight=wt], fpc(fpc)

      pweight: wt
          VCE: linearized
     Strata 1: <one>
         SU 1: clu
        FPC 1: fpc

svy: ratio ue91 hou85
(running ratio on estimation sample)

Survey: Ratio estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       8          Population size  =      32
                                    Design df        =       7

     _ratio_1: ue91/hou85

--------------------------------------------------------------
             |             Linearized
             |      Ratio   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    _ratio_1 |   .1602891   .0055256      .1472232     .173355
--------------------------------------------------------------
pages 106-107  Regression estimator
This example provides the numbers necessary to use the formula in the middle of page 106.  The svy: reg is run to get the coefficient of hou85 and the svy: total is run to get the estimated total of hou85.  These numbers are used in the formula and the result (15312) is shown in the last line of Table 3.14 on page 107.
svy: reg ue91 hou85
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         1                  Number of obs      =         8
Number of PSUs     =         8                  Population size    =        32
                                                Design df          =         7
                                                F(   1,      7)    =  44949.18
                                                Prob > F           =    0.0000
                                                R-squared          =    0.9982

------------------------------------------------------------------------------
             |             Linearized
        ue91 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       hou85 |   .1520142    .000717   212.01   0.000     .1503188    .1537097
       _cons |   42.65468   20.54033     2.08   0.076    -5.915492    91.22485
------------------------------------------------------------------------------

svy: total hou85
(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =       8
Number of PSUs   =       8          Population size  =      32
                                    Design df        =       7

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       hou85 |     164952   87298.57     -41476.32    371380.3
--------------------------------------------------------------

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California