|
|
|
||||
|
|
|||||
Stata takes theses characteristics into account through the use of survey procedures. Before issuing any survey commands it is necessary to set one or more of the following items:
Failure to analyze survey sampling designs without taking these characteristics into account can result in inaccurate point estimates and/or inaccurate estimates of standard errors.
In this unit we will be using data from the book Sampling of Populations by Levy and Lemeshow (1999) with permission of the authors.
Of course, in the normal course of events you wouldn't actually have access to data from the whole population. We were lucky in this instance that California collects and releases these data.
Let's try several computations on the population data.
use http://www.ats.ucla.edu/stat/stata/library/apipop
tabulate stype
stype | Freq. Percent Cum.
------------+-----------------------------------
E | 4421 71.38 71.38
H | 755 12.19 83.56
M | 1018 16.44 100.00
------------+-----------------------------------
Total | 6194 100.00
summarize api00
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
api00 | 6194 664.7126 128.2441 346 969
quietly summarize enroll
display %10.0fc r(sum)
3,811,472
regress api00 meals ell avg_ed
Source | SS df MS Number of obs = 6016
---------+------------------------------ F( 3, 6012) = 5837.12
Model | 73775065.7 3 24591688.6 Prob > F = 0.0000
Residual | 25328472.8 6012 4212.98616 R-squared = 0.7444
---------+------------------------------ Adj R-squared = 0.7443
Total | 99103538.5 6015 16476.0662 Root MSE = 64.908
------------------------------------------------------------------------------
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
meals | -1.672069 .0568866 -29.393 0.000 -1.783587 -1.560551
ell | -.6775632 .0616073 -10.998 0.000 -.7983355 -.5567908
avg_ed | 72.30502 2.09055 34.587 0.000 68.20679 76.40325
_cons | 558.443 7.969069 70.076 0.000 542.8207 574.0652
------------------------------------------------------------------------------
In this example, the sampling frame contains the 6,194 school so fpc = 6194 and the sampling weights (pw) = 6194/200 = 30.97.generate i = uniform() sort i . keep in 1/200
Of course, in the real world you probably wouldn't take a sample of 200 school from a computer file of 6,194, you would just analyze the entire dataset. But suppose you had to go out to each school to collect the data that you needed, then it would take much less time and cost much less money to go to 200 schools than to over 6,000 schools.
The file apisrs.dta has a simple random sample of 200 cases.
use http://www.ats.ucla.edu/stat/stata/library/apisrs
tabulate stype
stype | Freq. Percent Cum.
------------+-----------------------------------
E | 145 72.50 72.50
H | 25 12.50 85.00
M | 30 15.00 100.00
------------+-----------------------------------
Total | 200 100.00
tabulate dnum
district |
number | Freq. Percent Cum.
------------+-----------------------------------
1 | 1 0.50 0.50
40 | 1 0.50 1.00
41 | 1 0.50 1.50
43 | 1 0.50 2.00
46 | 3 1.50 3.50
48 | 1 0.50 4.00
55 | 1 0.50 4.50
56 | 2 1.00 5.50
57 | 1 0.50 6.00
60 | 1 0.50 6.50
67 | 1 0.50 7.00
80 | 1 0.50 7.50
90 | 2 1.00 8.50
98 | 1 0.50 9.00
103 | 1 0.50 9.50
105 | 1 0.50 10.00
108 | 2 1.00 11.00
124 | 1 0.50 11.50
131 | 1 0.50 12.00
135 | 2 1.00 13.00
148 | 2 1.00 14.00
154 | 1 0.50 14.50
159 | 1 0.50 15.00
162 | 1 0.50 15.50
166 | 3 1.50 17.00
175 | 1 0.50 17.50
176 | 1 0.50 18.00
184 | 1 0.50 18.50
190 | 1 0.50 19.00
209 | 1 0.50 19.50
217 | 1 0.50 20.00
222 | 1 0.50 20.50
229 | 1 0.50 21.00
231 | 1 0.50 21.50
238 | 1 0.50 22.00
248 | 2 1.00 23.00
253 | 3 1.50 24.50
255 | 1 0.50 25.00
259 | 1 0.50 25.50
266 | 1 0.50 26.00
272 | 1 0.50 26.50
274 | 1 0.50 27.00
278 | 2 1.00 28.00
293 | 1 0.50 28.50
301 | 1 0.50 29.00
304 | 1 0.50 29.50
335 | 1 0.50 30.00
351 | 1 0.50 30.50
352 | 1 0.50 31.00
353 | 1 0.50 31.50
358 | 1 0.50 32.00
360 | 1 0.50 32.50
379 | 1 0.50 33.00
390 | 1 0.50 33.50
393 | 1 0.50 34.00
395 | 2 1.00 35.00
401 | 18 9.00 44.00
416 | 1 0.50 44.50
418 | 2 1.00 45.50
436 | 1 0.50 46.00
444 | 1 0.50 46.50
445 | 1 0.50 47.00
451 | 1 0.50 47.50
457 | 2 1.00 48.50
459 | 1 0.50 49.00
460 | 1 0.50 49.50
470 | 1 0.50 50.00
473 | 1 0.50 50.50
479 | 1 0.50 51.00
491 | 1 0.50 51.50
495 | 1 0.50 52.00
498 | 1 0.50 52.50
503 | 2 1.00 53.50
507 | 5 2.50 56.00
509 | 1 0.50 56.50
513 | 2 1.00 57.50
529 | 2 1.00 58.50
532 | 1 0.50 59.00
533 | 1 0.50 59.50
536 | 1 0.50 60.00
537 | 2 1.00 61.00
539 | 3 1.50 62.50
541 | 1 0.50 63.00
542 | 1 0.50 63.50
547 | 1 0.50 64.00
556 | 2 1.00 65.00
564 | 1 0.50 65.50
570 | 1 0.50 66.00
579 | 1 0.50 66.50
590 | 1 0.50 67.00
600 | 1 0.50 67.50
602 | 1 0.50 68.00
605 | 1 0.50 68.50
614 | 2 1.00 69.50
620 | 3 1.50 71.00
623 | 1 0.50 71.50
627 | 3 1.50 73.00
629 | 1 0.50 73.50
630 | 2 1.00 74.50
632 | 5 2.50 77.00
633 | 1 0.50 77.50
635 | 1 0.50 78.00
636 | 2 1.00 79.00
637 | 1 0.50 79.50
640 | 1 0.50 80.00
642 | 1 0.50 80.50
643 | 1 0.50 81.00
644 | 1 0.50 81.50
645 | 1 0.50 82.00
648 | 1 0.50 82.50
651 | 1 0.50 83.00
653 | 1 0.50 83.50
658 | 1 0.50 84.00
665 | 1 0.50 84.50
688 | 1 0.50 85.00
689 | 1 0.50 85.50
702 | 1 0.50 86.00
711 | 1 0.50 86.50
716 | 1 0.50 87.00
720 | 1 0.50 87.50
731 | 1 0.50 88.00
739 | 1 0.50 88.50
744 | 3 1.50 90.00
745 | 1 0.50 90.50
750 | 1 0.50 91.00
751 | 1 0.50 91.50
754 | 1 0.50 92.00
756 | 1 0.50 92.50
761 | 1 0.50 93.00
779 | 2 1.00 94.00
780 | 1 0.50 94.50
782 | 1 0.50 95.00
788 | 1 0.50 95.50
796 | 4 2.00 97.50
797 | 1 0.50 98.00
803 | 1 0.50 98.50
815 | 1 0.50 99.00
830 | 1 0.50 99.50
834 | 1 0.50 100.00
------------+-----------------------------------
Total | 200 100.00
svyset
pweight is pw
fpc is fpc
svymean api00
Survey mean estimation
pweight: pw Number of obs = 200
Strata: <one> Number of strata = 1
PSU: <observations> Number of PSUs = 200
FPC: fpc Population size = 6193.9999
------------------------------------------------------------------------------
Mean | Estimate Std. Err. [95% Conf. Interval] Deff
---------+--------------------------------------------------------------------
api00 | 660.165 9.186887 642.0489 678.2811 1
------------------------------------------------------------------------------
Finite population correction (FPC) assumes simple random sampling without
replacement of PSUs within each stratum with no subsampling within PSUs.
Weights must represent population totals for deff to be correct when
using an FPC. Note: deft is invariant to the scale of weights.
svytotal enroll
Survey total estimation
pweight: pw Number of obs = 200
Strata: <one> Number of strata = 1
PSU: <observations> Number of PSUs = 200
FPC: fpc Population size = 6193.9999
------------------------------------------------------------------------------
Total | Estimate Std. Err. [95% Conf. Interval] Deff
---------+--------------------------------------------------------------------
enroll | 3924828 220705.4 3489607 4360049 1
------------------------------------------------------------------------------
Finite population correction (FPC) assumes simple random sampling without
replacement of PSUs within each stratum with no subsampling within PSUs.
Weights must represent population totals for deff to be correct when
using an FPC. Note: deft is invariant to the scale of weights.
svyreg api00 meals ell avg_ed
Survey linear regression
pweight: pw Number of obs = 200
Strata: <one> Number of strata = 1
PSU: <observations> Number of PSUs = 200
FPC: fpc Population size = 6193.9999
F( 3, 197) = 217.11
Prob > F = 0.0000
R-squared = 0.7640
------------------------------------------------------------------------------
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
meals | -1.367668 .3544273 -3.86 0.000 -2.066583 -.6687524
ell | -1.266818 .3895673 -3.25 0.001 -2.035028 -.4986079
avg_ed | 75.49145 14.28649 5.28 0.000 47.31912 103.6638
_cons | 544.7082 56.15402 9.70 0.000 433.9749 655.4414
------------------------------------------------------------------------------
Finite population correction (FPC) assumes simple random sampling without
replacement of PSUs within each stratum with no subsampling within PSUs.
In this example, there are three sampling frames: 4,421 elementary schools, 755 high schools, and 1,018 middle schools.
The file apistrat.dta contains the data for the stratified random sample.
use http://www.ats.ucla.edu/stat/stata/library/apistrat
tabulate stype
stype | Freq. Percent Cum.
------------+-----------------------------------
E | 100 50.00 50.00
H | 50 25.00 75.00
M | 50 25.00 100.00
------------+-----------------------------------
Total | 200 100.00
tabulate dnum
district |
number | Freq. Percent Cum.
------------+-----------------------------------
19 | 1 0.50 0.50
20 | 1 0.50 1.00
25 | 1 0.50 1.50
27 | 1 0.50 2.00
40 | 1 0.50 2.50
41 | 1 0.50 3.00
64 | 1 0.50 3.50
69 | 1 0.50 4.00
105 | 1 0.50 4.50
108 | 1 0.50 5.00
114 | 1 0.50 5.50
135 | 1 0.50 6.00
140 | 1 0.50 6.50
148 | 2 1.00 7.50
153 | 5 2.50 10.00
155 | 1 0.50 10.50
158 | 2 1.00 11.50
160 | 1 0.50 12.00
162 | 1 0.50 12.50
176 | 1 0.50 13.00
182 | 1 0.50 13.50
185 | 2 1.00 14.50
196 | 1 0.50 15.00
202 | 1 0.50 15.50
208 | 1 0.50 16.00
214 | 1 0.50 16.50
215 | 2 1.00 17.50
216 | 1 0.50 18.00
223 | 1 0.50 18.50
225 | 1 0.50 19.00
226 | 1 0.50 19.50
233 | 1 0.50 20.00
238 | 2 1.00 21.00
247 | 1 0.50 21.50
253 | 4 2.00 23.50
259 | 4 2.00 25.50
266 | 2 1.00 26.50
270 | 2 1.00 27.50
273 | 1 0.50 28.00
275 | 1 0.50 28.50
279 | 1 0.50 29.00
284 | 1 0.50 29.50
294 | 1 0.50 30.00
308 | 1 0.50 30.50
316 | 1 0.50 31.00
324 | 1 0.50 31.50
333 | 1 0.50 32.00
339 | 1 0.50 32.50
348 | 1 0.50 33.00
349 | 1 0.50 33.50
351 | 1 0.50 34.00
358 | 1 0.50 34.50
364 | 1 0.50 35.00
376 | 1 0.50 35.50
382 | 2 1.00 36.50
390 | 1 0.50 37.00
394 | 1 0.50 37.50
395 | 3 1.50 39.00
401 | 16 8.00 47.00
419 | 1 0.50 47.50
423 | 1 0.50 48.00
432 | 1 0.50 48.50
439 | 1 0.50 49.00
448 | 1 0.50 49.50
450 | 1 0.50 50.00
457 | 1 0.50 50.50
459 | 1 0.50 51.00
460 | 1 0.50 51.50
465 | 1 0.50 52.00
473 | 3 1.50 53.50
475 | 1 0.50 54.00
478 | 1 0.50 54.50
484 | 1 0.50 55.00
492 | 1 0.50 55.50
495 | 1 0.50 56.00
497 | 1 0.50 56.50
498 | 1 0.50 57.00
499 | 1 0.50 57.50
501 | 1 0.50 58.00
507 | 4 2.00 60.00
509 | 1 0.50 60.50
512 | 1 0.50 61.00
513 | 2 1.00 62.00
514 | 1 0.50 62.50
515 | 1 0.50 63.00
531 | 2 1.00 64.00
532 | 1 0.50 64.50
537 | 1 0.50 65.00
541 | 3 1.50 66.50
550 | 1 0.50 67.00
554 | 1 0.50 67.50
569 | 1 0.50 68.00
575 | 2 1.00 69.00
590 | 2 1.00 70.00
596 | 1 0.50 70.50
602 | 2 1.00 71.50
605 | 1 0.50 72.00
620 | 2 1.00 73.00
621 | 3 1.50 74.50
627 | 1 0.50 75.00
630 | 2 1.00 76.00
632 | 4 2.00 78.00
635 | 2 1.00 79.00
636 | 2 1.00 80.00
639 | 2 1.00 81.00
650 | 1 0.50 81.50
653 | 2 1.00 82.50
655 | 1 0.50 83.00
656 | 1 0.50 83.50
662 | 1 0.50 84.00
685 | 1 0.50 84.50
689 | 5 2.50 87.00
702 | 1 0.50 87.50
706 | 1 0.50 88.00
722 | 1 0.50 88.50
725 | 2 1.00 89.50
735 | 1 0.50 90.00
738 | 1 0.50 90.50
751 | 1 0.50 91.00
756 | 1 0.50 91.50
760 | 1 0.50 92.00
766 | 1 0.50 92.50
767 | 2 1.00 93.50
774 | 1 0.50 94.00
780 | 2 1.00 95.00
781 | 1 0.50 95.50
784 | 1 0.50 96.00
787 | 1 0.50 96.50
796 | 1 0.50 97.00
797 | 1 0.50 97.50
802 | 1 0.50 98.00
806 | 1 0.50 98.50
813 | 1 0.50 99.00
819 | 1 0.50 99.50
825 | 1 0.50 100.00
------------+-----------------------------------
Total | 200 100.00
svyset
pweight is pw
strata is stype
fpc is fpc
svymean api00
Survey mean estimation
pweight: pw Number of obs = 200
Strata: stype Number of strata = 3
PSU: <observations> Number of PSUs = 200
FPC: fpc Population size = 6194
------------------------------------------------------------------------------
Mean | Estimate Std. Err. [95% Conf. Interval] Deff
---------+--------------------------------------------------------------------
api00 | 662.2874 9.408941 643.7322 680.8425 1.204457
------------------------------------------------------------------------------
Finite population correction (FPC) assumes simple random sampling without
replacement of PSUs within each stratum with no subsampling within PSUs.
Weights must represent population totals for deff to be correct when
using an FPC. Note: deft is invariant to the scale of weights.
svytotal enroll
Survey total estimation
pweight: pw Number of obs = 200
Strata: stype Number of strata = 3
PSU: <observations> Number of PSUs = 200
FPC: fpc Population size = 6194
------------------------------------------------------------------------------
Total | Estimate Std. Err. [95% Conf. Interval] Deff
---------+--------------------------------------------------------------------
enroll | 3687178 114641.7 3461095 3913260 .3620181
------------------------------------------------------------------------------
Finite population correction (FPC) assumes simple random sampling without
replacement of PSUs within each stratum with no subsampling within PSUs.
Weights must represent population totals for deff to be correct when
using an FPC. Note: deft is invariant to the scale of weights.
svyreg api00 meals ell avg_ed
Survey linear regression
pweight: pw Number of obs = 200
Strata: stype Number of strata = 3
PSU: Number of PSUs = 200
FPC: fpc Population size = 6194
F( 3, 195) = 190.97
Prob > F = 0.0000
R-squared = 0.7125
------------------------------------------------------------------------------
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
meals | -1.818234 .4076227 -4.46 0.000 -2.622098 -1.01437
ell | -.0191524 .3890413 -0.05 0.961 -.7863726 .7480679
avg_ed | 77.47879 16.93665 4.57 0.000 44.07838 110.8792
_cons | 534.4453 65.57342 8.15 0.000 405.1294 663.7613
------------------------------------------------------------------------------
Finite population correction (FPC) assumes simple random sampling without
replacement of PSUs within each stratum with no subsampling within PSUs.
In this example, the sampling frame contains the 757 school districts.
The file apiclus1.dta will contain the data for the one-stage cluster sampling design.
use http://www.ats.ucla.edu/stat/stata/library/apiclus1
tabulate stype
stype | Freq. Percent Cum.
------------+-----------------------------------
E | 144 78.69 78.69
H | 14 7.65 86.34
M | 25 13.66 100.00
------------+-----------------------------------
Total | 183 100.00
tabulate dnum
district |
number | Freq. Percent Cum.
------------+-----------------------------------
61 | 13 7.10 7.10
135 | 34 18.58 25.68
178 | 4 2.19 27.87
197 | 13 7.10 34.97
255 | 16 8.74 43.72
406 | 2 1.09 44.81
413 | 1 0.55 45.36
437 | 4 2.19 47.54
448 | 12 6.56 54.10
510 | 21 11.48 65.57
568 | 9 4.92 70.49
637 | 11 6.01 76.50
716 | 37 20.22 96.72
778 | 2 1.09 97.81
815 | 4 2.19 100.00
------------+-----------------------------------
Total | 183 100.00
svyset
pweight is pw
psu is dnum
fpc is fpc
/* list fpc pw dnum -- to see the values for these items */
svymean api00
Survey mean estimation
pweight: pw Number of obs = 183
Strata: <one> Number of strata = 1
PSU: dnum Number of PSUs = 15
FPC: fpc Population size = 6194.0003
------------------------------------------------------------------------------
Mean | Estimate Std. Err. [95% Conf. Interval] Deff
---------+--------------------------------------------------------------------
api00 | 644.1694 23.54224 593.6763 694.6625 9.345869
------------------------------------------------------------------------------
Finite population correction (FPC) assumes simple random sampling without
replacement of PSUs within each stratum with no subsampling within PSUs.
Weights must represent population totals for deff to be correct when
using an FPC. Note: deft is invariant to the scale of weights.
svytotal enroll
Survey total estimation
pweight: pw Number of obs = 183
Strata: <one> Number of strata = 1
PSU: dnum Number of PSUs = 15
FPC: fpc Population size = 6194.0003
------------------------------------------------------------------------------
Total | Estimate Std. Err. [95% Conf. Interval] Deff
---------+--------------------------------------------------------------------
enroll | 3404940 932235 1405495 5404385 31.31066
------------------------------------------------------------------------------
Finite population correction (FPC) assumes simple random sampling without
replacement of PSUs within each stratum with no subsampling within PSUs.
Weights must represent population totals for deff to be correct when
using an FPC. Note: deft is invariant to the scale of weights.
svyreg api00 meals ell avg_ed
Survey linear regression
pweight: pw Number of obs = 157
Strata: <one> Number of strata = 1
PSU: dnum Number of PSUs = 15
FPC: fpc Population size = 5313.9784
F( 3, 12) = 54.36
Prob > F = 0.0000
R-squared = 0.6978
------------------------------------------------------------------------------
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
meals | -2.948702 .3266161 -9.028 0.000 -3.649224 -2.24818
ell | -.2227005 .3938377 -0.565 0.581 -1.067398 .6219974
avg_ed | 16.42832 15.32151 1.072 0.302 -16.43304 49.28968
_cons | 755.4386 55.61202 13.584 0.000 636.1626 874.7145
------------------------------------------------------------------------------
Finite population correction (FPC) assumes simple random sampling without
replacement of PSUs within each stratum with no subsampling within PSUs.
Once again, the sampling frame contains the 757 school districts.
The file apiclus2.dta contains the data for the two-stage cluster sampling design.
use http://www.ats.ucla.edu/stat/stata/library/apiclus2
tabulate stype
stype | Freq. Percent Cum.
------------+-----------------------------------
E | 83 65.87 65.87
H | 20 15.87 81.75
M | 23 18.25 100.00
------------+-----------------------------------
Total | 126 100.00
tabulate dnum
district |
number | Freq. Percent Cum.
------------+-----------------------------------
15 | 1 0.79 0.79
63 | 1 0.79 1.59
83 | 3 2.38 3.97
117 | 1 0.79 4.76
132 | 3 2.38 7.14
152 | 3 2.38 9.52
173 | 4 3.17 12.70
176 | 1 0.79 13.49
198 | 4 3.17 16.67
200 | 5 3.97 20.63
228 | 2 1.59 22.22
264 | 1 0.79 23.02
295 | 5 3.97 26.98
302 | 4 3.17 30.16
403 | 5 3.97 34.13
452 | 4 3.17 37.30
456 | 1 0.79 38.10
480 | 5 3.97 42.06
523 | 2 1.59 43.65
534 | 5 3.97 47.62
549 | 5 3.97 51.59
552 | 2 1.59 53.17
570 | 5 3.97 57.14
574 | 1 0.79 57.94
575 | 5 3.97 61.90
596 | 5 3.97 65.87
620 | 5 3.97 69.84
638 | 5 3.97 73.81
639 | 5 3.97 77.78
674 | 2 1.59 79.37
679 | 4 3.17 82.54
687 | 3 2.38 84.92
701 | 2 1.59 86.51
711 | 2 1.59 88.10
719 | 1 0.79 88.89
731 | 5 3.97 92.86
742 | 1 0.79 93.65
768 | 2 1.59 95.24
781 | 5 3.97 99.21
795 | 1 0.79 100.00
------------+-----------------------------------
Total | 126 100.00
svyset
pweight is pw
psu is dnum
/* list pw dnum -- to see the values for these items */
svymean api00
Survey mean estimation
pweight: pw Number of obs = 126
Strata: Number of strata = 1
PSU: dnum Number of PSUs = 40
Population size = 5128.6749
------------------------------------------------------------------------------
Mean | Estimate Std. Err. [95% Conf. Interval] Deff
---------+--------------------------------------------------------------------
api00 | 670.8118 30.71158 608.6918 732.9318 6.347638
------------------------------------------------------------------------------
svytotal enroll
Survey total estimation
pweight: pw Number of obs = 120
Strata: Number of strata = 1
PSU: dnum Number of PSUs = 38
Population size = 5015.1249
------------------------------------------------------------------------------
Total | Estimate Std. Err. [95% Conf. Interval] Deff
---------+--------------------------------------------------------------------
enroll | 2639273 815060.9 987802.5 4290743 24.53485
------------------------------------------------------------------------------
svyreg api00 meals ell avg_ed
Survey linear regression
pweight: pw Number of obs = 126
Strata: Number of strata = 1
PSU: dnum Number of PSUs = 40
Population size = 5128.6749
F( 3, 37) = 200.50
Prob > F = 0.0000
R-squared = 0.7405
------------------------------------------------------------------------------
api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
meals | .4461714 .4670056 0.96 0.345 -.4984366 1.390779
ell | -1.922801 .8664354 -2.22 0.032 -3.675332 -.1702702
avg_ed | 134.7738 19.193 7.02 0.000 95.95232 173.5953
_cons | 306.6302 79.29093 3.87 0.000 146.2492 467.0112
------------------------------------------------------------------------------
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services