UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata FAQ 
How do I analyze survey data with a probability proportional to size sampling design?

The examples below use Stata 9. 

NOTE:  If you want to see the design effect or the misspecification effect, use estat effects after the command.

This example is taken from Levy and Lemeshow's Sampling of Populations.

page 350 cluster sampling with unequal probabilities:  probability proportional to size sampling
This example uses the hospslct data set.
svyset drawing [pw=wstar]

      pweight: wstar
          VCE: linearized
     Strata 1: <one>
         SU 1: drawing
        FPC 1: <zero>

svy: total lifethrt dxdead
(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =      50
Number of PSUs   =       5          Population size  =   50056
                                    Design df        =       4

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    lifethrt |    6006.72    1001.12      3227.165    8786.275
      dxdead |    2002.24   1226.117     -1402.005    5406.485
--------------------------------------------------------------

svy: mean lifethrt dxdead
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =      50
Number of PSUs   =       5          Population size  =   50056
                                    Design df        =       4

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    lifethrt |        .12        .02      .0644711    .1755289
      dxdead |        .04   .0244949     -.0280087    .1080087
--------------------------------------------------------------

svy: ratio dxdead/lifethrt
(running ratio on estimation sample)

Survey: Ratio estimation

Number of strata =       1          Number of obs    =      50
Number of PSUs   =       5          Population size  =   50056
                                    Design df        =       4

     _ratio_1: dxdead/lifethrt

--------------------------------------------------------------
             |             Linearized
             |      Ratio   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    _ratio_1 |   .3333333   .2324056      -.311928    .9785946
--------------------------------------------------------------

Page 351 cluster sampling with unequal probabilities:   probability proportional to size sampling

gen tl = .
(50 missing values generated)

replace tl = 785 if hospno == 2
(10 real changes made)

replace tl = 3404 if hospno == 5
(30 real changes made)

replace tl = 778 if hospno == 9
(10 real changes made)

gen w2star = (admiss/50)*(7087/tl)

svyset drawing [pw=w2star]

      pweight: w2star
          VCE: linearized
     Strata 1: <one>
         SU 1: drawing
        FPC 1: <zero>

svy: total lifethrt dxdead
(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =      50
Number of PSUs   =       5          Population size  =   51345
                                    Design df        =       4

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    lifethrt |   6259.176   1277.322      2712.762    9805.591
      dxdead |   1760.471   1079.043     -1235.433    4756.376
--------------------------------------------------------------

svy: mean lifethrt dxdead
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1          Number of obs    =      50
Number of PSUs   =       5          Population size  =   51345
                                    Design df        =       4

--------------------------------------------------------------
             |             Linearized
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    lifethrt |   .1219043   .0214156      .0624452    .1813634
      dxdead |   .0342871   .0230032       -.02958    .0981542
--------------------------------------------------------------

svy: ratio dxdead/lifethrt
(running ratio on estimation sample)

Survey: Ratio estimation

Number of strata =       1          Number of obs    =      50
Number of PSUs   =       5          Population size  =   51345
                                    Design df        =       4

     _ratio_1: dxdead/lifethrt

--------------------------------------------------------------
             |             Linearized
             |      Ratio   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
    _ratio_1 |   .2812625   .2114835     -.3059098    .8684347
--------------------------------------------------------------

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.