UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SUDAAN FAQ 
How can I use the subpopn statement in SUDAAN?

Below is an example of the subpopn statement.  This statement should be used whenever you want to analyze only a  subpopulation in your data.  You should NOT subset your data in a data step before running the analysis, as this can cause a wide variety of problems, from incorrect results to difficulties running the procedure at all.  See pages 166-169 of the SUDAAN manual for more information regarding the subpopn statement, how to use it, and how missing values are handled.  See especially the note in the middle of page 169 for a more complete explanation of why the subpopn statement should be used instead of subsetting the data first.  Other references on this are:  Cochran (1977, Section 2.13, pages 35-38), the Stata 8 Survey Manual, pages 50-52, and the Stata 9 Survey Manual, page 38.  There are a few basic reasons why you should not subset your data in order to look at just a subpopulation.  One is that the standard errors of the estimates may be incorrect, and another is that the sampling information for observations not included in the subpopulation is still used in the calculations.  If you delete these observations before making the calculations, then that information is not available.  Also, depending on how you subset, you may find that you have strata with too few PSUs to run the procedure.
The example below shows a regression for just the males in the data set (srsex = 1).  We have bolded the note in the output that indicates the subpopulation used.  The subgroup and levels statements are used to indicate that racehpra is a categorical variable with four levels.  In SUDAAN 9, you could use the class statement instead of these two statements.
proc regress data=temp1 filetype=sas design = jackknife;
weight rakedw0;  
jackwgts rakedw1--rakedw80 / adjjack=1;  
model ae13 = ae14 racehpra;
subpopn srsex = 1;
subgroup racehpra;
levels 4;
run;
                                  S U D A A N
            Software for the Statistical Analysis of Correlated Data
           Copyright      Research Triangle Institute      January 2003
                                Release 8.0.2

Number of observations read       :  55428    Weighted count: 23847415
Observations in subpopulation     :  23002    Weighted count: 11631728
Observations used in the analysis :   3744    Weighted count:  2522055
Denominator degrees of freedom    :     80

Maximum number of estimable parameters for the model is  5
Weighted mean response is 3.133033

Multiple R-Square for the dependent variable AE13: 0.231226
Variance Estimation Method: Replicate Weight Jackknife
Working Correlations: Independent
Link Function: Identity
Response variable AE13: Number of drinks on the days drinking alcohol
For Subpopulation: SRSEX = 1
----------------------------------------------------------------------
Independent                                                   P-value
  Variables and        Beta                                   T-Test
  Effects              Coeff.          SE Beta   T-Test B=0   B=0
----------------------------------------------------------------------
Intercept                    1.71         0.07        24.92     0.0000
Number of times
  having 5 or more
  drinks in past
  month                      0.38         0.04         9.67     0.0000
Race - UCLA CHPR
  Definition
  LATINO                     1.29         0.11        12.31     0.0000
  PACIFIC ISLANDER           0.84         0.59         1.44     0.1543
  AIAN                       0.54         0.24         2.20     0.0307
  ASIAN                      0.00         0.00          .        .
----------------------------------------------------------------------
-------------------------------------------------------

Contrast               Degrees
                       of                      P-value
                       Freedom        Wald F   Wald F
-------------------------------------------------------
OVERALL MODEL                 5       618.86     0.0000
MODEL MINUS
  INTERCEPT                   4        63.04     0.0000
INTERCEPT                     .          .        .
AE14                          1        93.52     0.0000
RACEHPRA                      3        50.72     0.0000
-------------------------------------------------------
In this example, we have two conditions on the subpopn statement.  Hence, the regression results apply only to those cases where both srsex = 1 and racehpra = 2 is true.
proc regress data=temp1 filetype=sas design = jackknife;
weight rakedw0;  
jackwgts rakedw1--rakedw80 / adjjack=1;  
model ae13 =  ae14 ;
subpopn srsex = 1 and racehpra = 2;
run;
                                  S U D A A N
            Software for the Statistical Analysis of Correlated Data
           Copyright      Research Triangle Institute      January 2003
                                Release 8.0.2

Number of observations read       :  55428    Weighted count: 23847415
Observations in subpopulation     :    101    Weighted count:    30282
Observations used in the analysis :     69    Weighted count:    17998
Denominator degrees of freedom    :     80

Maximum number of estimable parameters for the model is  2
Weighted mean response is 3.607368

Multiple R-Square for the dependent variable AE13: 0.068544
Variance Estimation Method: Replicate Weight Jackknife
Working Correlations: Independent
Link Function: Identity
Response variable AE13: Number of drinks on the days drinking alcohol
For Subpopulation: SRSEX = 1 AND RACEHPRA = 2
----------------------------------------------------------------------
Independent                                                   P-value
  Variables and        Beta                                   T-Test
  Effects              Coeff.          SE Beta   T-Test B=0   B=0
----------------------------------------------------------------------
Intercept                    3.05         0.63         4.86     0.0000
Number of times
  having 5 or more
  drinks in past
  month                      0.20         0.13         1.60     0.1145
----------------------------------------------------------------------
-------------------------------------------------------

Contrast               Degrees
                       of                      P-value
                       Freedom        Wald F   Wald F
-------------------------------------------------------
OVERALL MODEL                 2        19.02     0.0000
MODEL MINUS
  INTERCEPT                   1         2.55     0.1145
INTERCEPT                     1        23.64     0.0000
AE14                          1         2.55     0.1145
-------------------------------------------------------

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California