UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Textbook Examples
Sampling: Design and Analysis by Sharon L. Lohr
Chapter 5: Cluster Sampling with Equal Probabilities

The examples below use Stata 9.  If you are using Stata versions 7 or 8, please see this page.

NOTE:  If you want to see the design effect or the misspecification effect, use estat effects after the command.

Page 137 in the middle

clear
input person_num cluster gpa
1 1 3.08
2 1 2.60
3 1 3.44
4 1 3.04
1 2 2.36
2 2 3.04
3 2 3.28
4 2 2.68
1 3 2.00
2 3 2.56
3 3 2.52
4 3 1.88
1 4 3.00
2 4 2.88
3 4 3.44
4 4 3.64
1 5 2.68
2 5 1.92
3 5 3.28
4 5 3.20
end

list

     +---------------------------+
     | person~m   cluster    gpa |
     |---------------------------|
  1. |        1         1   3.08 |
  2. |        2         1    2.6 |
  3. |        3         1   3.44 |
  4. |        4         1   3.04 |
  5. |        1         2   2.36 |
     |---------------------------|
  6. |        2         2   3.04 |
  7. |        3         2   3.28 |
  8. |        4         2   2.68 |
  9. |        1         3      2 |
 10. |        2         3   2.56 |
     |---------------------------|
 11. |        3         3   2.52 |
 12. |        4         3   1.88 |
 13. |        1         4      3 |
 14. |        2         4   2.88 |
 15. |        3         4   3.44 |
     |---------------------------|
 16. |        4         4   3.64 |
 17. |        1         5   2.68 |
 18. |        2         5   1.92 |
 19. |        3         5   3.28 |
 20. |        4         5    3.2 |
     +---------------------------+
     
tabstat gpa, s(sum) by(cluster)

Summary for variables: gpa
     by categories of: cluster 
     
 cluster |       sum
---------+----------
       1 |     12.16
       2 |     11.36
       3 |      8.96
       4 |     12.96
       5 |     11.08
---------+----------
   Total |     56.52
--------------------

gen pwt = 100/5
svyset cluster [pweight = pwt]

      pweight: pwt
          VCE: linearized
     Strata 1: <one>
         SU 1: cluster
        FPC 1: <zero>

svy: total gpa
(running total on estimation sample)

Survey: Total estimation

Number of strata =       1          Number of obs    =      20
Number of PSUs   =       5          Population size  =     400
                                    Design df        =       4

--------------------------------------------------------------
             |             Linearized
             |      Total   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
         gpa |     1130.4   67.16666      943.9154    1316.885
--------------------------------------------------------------
Page 135 at the top
di total = (100/5)*56.52
1130.41130.4
Page 141 at the top
Population A
clear
input cluster value
1 10
1 20
1 30
2 11
2 20
2 32
3 9
3 17
3 31
end

tabstat value, s(mean var)

    variable |      mean  variance
-------------+--------------------
       value |        20      84.5
----------------------------------

tabstat value, s(mean var) by(cluster)

Summary for variables: value
     by categories of: cluster
 cluster |      mean  variance
---------+--------------------
       1 |        20       100
       2 |        21       111
       3 |        19       124
---------+--------------------
   Total |        20      84.5
------------------------------

anova value cluster, regress

      Source |       SS       df       MS              Number of obs =       9
-------------+------------------------------           F(  2,     6) =    0.03
       Model |           6     2           3           Prob > F      =  0.9736
    Residual |         670     6  111.666667           R-squared     =  0.0089
-------------+------------------------------           Adj R-squared = -0.3215
       Total |         676     8        84.5           Root MSE      =  10.567
------------------------------------------------------------------------------
       value        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------------------------------------------
_cons                  19   6.101002     3.11   0.021     4.071387    33.92861
cluster
           1            1   8.628119     0.12   0.912    -20.11225    22.11225
           2            2   8.628119     0.23   0.824    -19.11225    23.11225
           3    (dropped)
------------------------------------------------------------------------------
Population B
clear
input cluster value
1 9
1 10
1 11
2 17
2 20
2 20
3 31
3 32
3 30
end

tabstat value, s(mean var)

    variable |      mean  variance
-------------+--------------------
       value |        20      84.5
----------------------------------

tabstat value, s(mean var) by(cluster)

Summary for variables: value
     by categories of: cluster
 cluster |      mean  variance
---------+--------------------
       1 |        10         1
       2 |        19         3
       3 |        31         1
---------+--------------------
   Total |        20      84.5
------------------------------

anova value cluster, regress

      Source |       SS       df       MS              Number of obs =       9
-------------+------------------------------           F(  2,     6) =  199.80
       Model |         666     2         333           Prob > F      =  0.0000
    Residual |          10     6  1.66666667           R-squared     =  0.9852
-------------+------------------------------           Adj R-squared =  0.9803
       Total |         676     8        84.5           Root MSE      =   1.291
------------------------------------------------------------------------------
       value        Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------------------------------------------
_cons                  31    .745356    41.59   0.000     29.17618    32.82382
cluster
           1          -21   1.054093   -19.92   0.000    -23.57927   -18.42073
           2          -12   1.054093   -11.38   0.000    -14.57927   -9.420728
           3    (dropped)
------------------------------------------------------------------------------
Page 142 at the top
clear
input person_num cluster gpa
1 1 3.08
2 1 2.60
3 1 3.44
4 1 3.04
1 2 2.36
2 2 3.04
3 2 3.28
4 2 2.68
1 3 2.00
2 3 2.56
3 3 2.52
4 3 1.88
1 4 3.00
2 4 2.88
3 4 3.44
4 4 3.64
1 5 2.68
2 5 1.92
3 5 3.28
4 5 3.20
end

anova gpa cluster

                           Number of obs =      20     R-squared     =  0.4483
                           Root MSE      = .430163     Adj R-squared =  0.3012
                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  2.25568025     4  .563920063       3.05     0.0504
                         |
                 cluster |  2.25568025     4  .563920063       3.05     0.0504
                         |
                Residual |  2.77560022    15  .185040014  
              -----------+----------------------------------------------------
                   Total |  5.03128047    19  .264804235     
Page 149, figure 5.3
use http://www.ats.ucla.edu/stat/stata/examples/lohr/coots.dta, clear
graph scatter volume clutch, ylabel( , nogrid angle(0)) ytitle(Egg Volume) xtitle(Clutch Number)
Page 149, figure 5.4
. sort clutch volume
by clutch: egen mean = mean(volume)
by clutch: gen n = _n
drop csize breadth length
reshape wide volume, i(clutch) j(n)
sort mean
gen rank = _n
graph twoway rspike volume1 volume2 rank, ylabel( , nogrid angle(0)) ///
  xtitle(Clutch Ranked by Means) ytitle(Egg Volume)
Page 150, figure 5.5
sort clutch
egen sd_dev = rowsd(volume1 volume2)
graph scatter sd_dev mean, ylabel( , nogrid angle(0)) xlabel(1(1)5) ///
  xtitle(Mean Egg Volume for Clutch) ytitle(Standard Deviation for Clutch)
Page 150, table 5.2
use http://www.ats.ucla.edu/stat/stata/examples/lohr/coots.dta, clear
sort clutch
by clutch: gen n = _n
by clutch: egen mean_vol = mean(volume)
by clutch: egen sd_vol = sd(volume)
gen var_vol = sd_vol*sd_vol
by clutch: gen sum_vol = csize*mean_vol
gen mi = 2
gen a = (1 - (2/csize))*csize^2*(var_vol/mi)
gen yr = 4375.947/1757
gen b = (sum_vol - (csize*yr))^2
NOTE: The extra cases need to be dropped because you can't use if with tabstat.
. drop if n == 2
list clutch csize mean_vol var_vol sum_vol a b in 1/20

     +-----------------------------------------------------------------------+
     | clutch   csize   mean_vol    var_vol    sum_vol          a          b |
     |-----------------------------------------------------------------------|
  1. |      1      13   3.864303   .0093972   50.23594   .6719013   318.9229 |
  2. |      2      13   4.194183   .0009177   54.52438   .0656166    490.483 |
  3. |      3       6   .9162504   .0004814   5.497502   .0057766   89.22637 |
  4. |      4      11   2.998335    .000795   32.98168   .0393539   31.19573 |
  5. |      5      10   2.495708   .0001574   24.95708   .0062977   .0026303 |
     |-----------------------------------------------------------------------|
  6. |      6      13    3.98426   .0003304   51.79538   .0236219   377.0529 |
  7. |      7       9   1.927069   .0050616   17.34362   .1594406   25.72102 |
  8. |      8      11   2.961526    .005123   32.57679   .2535884   26.83677 |
  9. |      9      12   3.460579   .0001066   41.52695   .0063963   135.4897 |
 10. |     10      11   2.961526   .0223972   32.57679   1.108663   26.83677 |
     |-----------------------------------------------------------------------|
 11. |     11      12   3.498909   .0035246    41.9869   .2114751   146.4089 |
 12. |     12      11   2.999868   .0001694   32.99855   .0083832   31.38445 |
 13. |     13      12   3.566441    .012899   42.79729   .7739403   166.6769 |
 14. |     14      11   2.986065   .0029402   32.84672   .1455399   29.70632 |
 15. |     15      11   2.982998   .0010585   32.81298   .0523944   29.33965 |
     |-----------------------------------------------------------------------|
 16. |     16      10   2.406982   .0020082   24.06982   .0803278   .6988369 |
 17. |     17       9    2.01023   .0005397   18.09207   .0169998   18.68958 |
 18. |     18      10   2.437402   .0000289   24.37403   .0011567   .2827725 |
 19. |     19      11   2.926252   .0000753   32.18877   .0037259   22.96712 |
 20. |     20      11   2.947723   .0415674   32.42496   2.057586   25.28671 |
     +-----------------------------------------------------------------------+
    
list clutch csize mean_vol var_vol sum_vol a b in -10/l

     +-----------------------------------------------------------------------+
     | clutch   csize   mean_vol    var_vol    sum_vol          a          b |
     |-----------------------------------------------------------------------|
175. |    175      10     2.4336   .0008226     24.336   .0329023    .324661 |
176. |    176      12   3.752611   .0026651   45.03133   .1599053   229.3525 |
177. |    177      10   2.527395   .0003213   25.27395   .0128525    .135543 |
178. |    178      11   3.101091   .0127205     34.112   .6296641   45.09971 |
179. |    179       9   1.940416   .0014251   17.46374   .0448905   24.51704 |
     |-----------------------------------------------------------------------|
180. |    180       9   1.946576   .0000759   17.51918   .0023906   23.97109 |
181. |    181      12   3.453279   .0017057   41.43934   .1023399   133.4579 |
182. |    182      13   4.219888   .0000367   54.85854   .0026246    505.396 |
183. |    183      13   4.414816   .0088191   57.39261   .6305637   625.7546 |
184. |    184      12   3.484307   6.66e-06   41.81168   .0003997   142.1994 |
     +-----------------------------------------------------------------------+
    
tabstat csize sum_vol a b, s(sum)

   stats |     csize   sum_vol         a         b
---------+----------------------------------------
     sum |      1757  4375.947  42.17445  11439.58
--------------------------------------------------

tabstat sum_vol, s(var)

    variable |  variance
-------------+----------
     sum_vol |  149.5648
------------------------
Page 154 Table 5.3
NOTE: There was an error in the data set. This has been corrected in the data that you can use from the ATS website. However, if you use the data off of the CD in the book or from the publisher's website, your answers may differ slightly. (The problem is with csize for clutch 88.)
use http://www.ats.ucla.edu/stat/stata/examples/lohr/coots.dta, clear
gen relwt = csize/2
gen wtvol = relwt*volume
list clutch csize volume relwt wtvol in 1/8

     +----------------------------------------------+
     | clutch   csize     volume   relwt      wtvol |
     |----------------------------------------------|
  1. |      1      13   3.795757     6.5   24.67242 |
  2. |      1      13    3.93285     6.5   25.56352 |
  3. |      2      13   4.215604     6.5   27.40142 |
  4. |      2      13   4.172762     6.5   27.12295 |
  5. |      3       6   .9317646       3   2.795294 |
     |----------------------------------------------|
  6. |      3       6   .9007362       3   2.702209 |
  7. |      4      11   3.018272     5.5    16.6005 |
  8. |      4      11   2.978397     5.5   16.38118 |
     +----------------------------------------------+
    
list clutch csize volume relwt wtvol in -4/l

     +----------------------------------------------+
     | clutch   csize     volume   relwt      wtvol |
     |----------------------------------------------|
365. |    183      13   4.481221     6.5   29.12794 |
366. |    183      13   4.348412     6.5   28.26468 |
367. |    184      12   3.486132       6   20.91679 |
368. |    184      12   3.482482       6   20.89489 |
     +----------------------------------------------+
    
tabstat csize relwt wtvol, s(sum)

   stats |     csize     relwt     wtvol
---------+------------------------------
     sum |      3514      1757  4375.947
----------------------------------------
Page 158 at the top
NOTE: You need to increase the matsize (matrix size) above the default value or you will get an error message.
set matsize 200
anova volume clutch

                           Number of obs =     368     R-squared     =  0.9958
                           Root MSE      =  .07697     Adj R-squared =  0.9916
                  Source |  Partial SS    df       MS           F     Prob > F
              -----------+----------------------------------------------------
                   Model |  257.417531   183  1.40665317     237.44     0.0000
                         |
                  clutch |  257.417531   183  1.40665317     237.44     0.0000
                         |
                Residual |  1.09007809   184  .005924337  
              -----------+----------------------------------------------------
                   Total |  258.507609   367  .704380405     
Page 167 in the middle
. sort clutch
by clutch: gen n = _n
by clutch: egen mean_vol = mean(volume)
by clutch: gen sum_vol = csize*mean_vol
corr sum_vol csize

(obs=368)
             |  sum_vol    csize
-------------+------------------
     sum_vol |   1.0000
       csize |   0.9693   1.0000
Page 167, figure 5.11
graph scatter sum_vol csize, ylabel(0(10)60, nogrid angle(0)) ytitle(Estimated Totla of Egg Volumes for Clutch) ///
  xlabel(6(2)14) xtitle(Clutch Size) msymbol(Oh)

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California