UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

WesVar FAQ
How do I analyze survey data with a stratified design with certainty PSUs?

This example is taken from Lehtonen and Pahkinen's Practical Methods for Design and Analysis of Complex Surveys.
page 60 Table 2.8  Estimates under a PPSSYS design (n = 8); the Province'91 population.
The SAS data file for this example can be downloaded by clicking here.  Note that a modification was made to this data set:  variables called id and cons were added to the data set.  The variable id starts at one and counts the number of observations in each strata.  This is necessary for use in WesVar as the VarUnit variable (in other words, this variable is the PSU) because the PSUs within each strata must be numbered consecutively starting with one.  The variable cons was added for use in making the table.  The variable cons is equal to one for all cases.  A "movie" showing how to convert the data file to a WesVar data file can be viewed by clicking here.  A "movie" showing how to analyze the data in WesVar can be viewed by clicking here.  Please note that the method shown in the "movie" is not the only way to analyze this data.  Another way to analyze the data would be to copy the observation in strata 2 (the certainty or self-representing PSU), so that there were two observations in strata 2.  The weight of each of these observations would be reduced from one to one-half.  When analyzing the data in WesVar, you would not specify strata 2 as having a self-representing PSU.
In this example, the variable wt is used as the weight variable, the variable id is used as the VarUnit and the variable str is used as the StratVar (the variable that defines the strata).  The jackknife-n (jkn) method of creating the replicate weights is used because it allows more than two PSUs per strata (as we have in strata 1).  Also, we will need to use an FPC of .75 (1 - (8/32) = .75).
The output (shown at the end of the analysis "movie") is given below.
The marginal sum_wts value of 33.8680 is incorrect because of the addition of the second observation in strata two (the correct value is 32).  The marginal ue91 value of 15077.4280 is the estimated total of the variable ue91, and its standard error is 521.1212.  The marginal ratio value of 0.1285 is the estimated ratio of ue91/lab91, and its standard error is 0.0022.  Note that the cell totals listed in the right-most column are incorrect because of the modification that was made to the data set.

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.