|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
While this is the standard way of coding dummy variables, SUDAAN considers the cases coded as 0 to be missing for variables that are listed on the subgroup statement. In other words, to SUDAAN, non-positive values in variables that are used as categorical independent variables are considered to be missing. Hence, when SUDAAN does a listwise deletion of missing data, a large portion of your cases may be deleted, possibly to the point of making the model unestimatible. (Please see the section regarding the subgroup statement in the Features and Functions chapter of the SUDAAN manual for a complete description regarding the use of the subgroup statement, including valid values for subgroups, and below for the example using the subgroup statement.) Consider the example below in which srsex is coded 1/2 and newvar1 is coded 0/1. As you can see, an error is printed in the log and the number of cases used in the analysis is about 4050 fewer than there should be (the 4050 cases that are coded 0 in the data step). You have several ways of dealing with this problem. Perhaps the easiest is to not list the 0/1 variable on the subgroup statement. In many ways the subgroup statement in SUDAAN is like the class statement in SAS. In the same way that you would not list a 0/1 variable on the class statement in SAS, you do not list a 0/1 variable on the subgroup statement in SUDAAN. Another solution is the recode the 0/1 variable to be a 1/2 variable. If you have a variable that is 0/1/2, then you need to recode it. You can do this in a data step before running the procedure.
The following message is displayed in the log.data temp01; set temp1; newvar1 = 0; if _n_ > 4050 then newvar1 = 1; run;proc regress data=temp01 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ab1 = srsex newvar1 ; subgroup srsex newvar1; levels 2 2; run;
This problem is caused by the dummy variable newvar1. If you compare the number of cases used by SUDAAN for the analysis above, 51339, you will see that the 4050 cases coded as 0 in the data step above are missing. Although in the example below we have recoded the problem variable in a data step, you could also use the recode statement in SUDAAN to temporarily recode the variable. If you have many variables that need to be recoded, you may want to use an array in a data step. These options are perhaps most useful when you really want to have the dummy variable listed on the subgroup statement, such as when you are using proc crosstabs. As mentioned above, you could also list only the categorical variables coded with non-zero values on the subgroup statement.Opened SAS data file TEMP01 for reading. DATA WARNING: The matrix for estimable parameters is singular. The model may be overspecified. You should reduce the number of variables on the right-hand side and refit the model before attempting to draw any conclusions. DATA WARNING : Degrees of freedom for OVERALL contrast are less than maximum number of estimable parameters Computational instability when deriving TEST statistics can result. The model is likely over- parameterized; you may wish to rerun your job and reduce the number of parameters in your model by dropping variables or collapsing classes for categorical variables.The erroneous output is shown below.Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 51378 Weighted count: 22083940 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 3 Weighted mean response is 2.496445 Multiple R-Square for the dependent variable AB1: 0.001289Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AB1: General health condition by: Independent Variables and Effects. ------------------------------------------------------------------------------------------------ Independent P-value Variables and Beta Lower 95% Upper 95% T-Test Effects Coeff. SE Beta Limit Beta Limit Beta T-Test B=0 B=0 ------------------------------------------------------------------------------------------------ Intercept 2.54 0.01 2.52 2.55 296.12 0.0000 Self-reported gender 1 -0.08 0.01 -0.10 -0.05 -6.34 0.0000 2 0.00 0.00 0.00 0.00 . . NEWVAR1 1 0.00 0.00 0.00 0.00 . . 2 0.00 0.00 0.00 0.00 . . ------------------------------------------------------------------------------------------------------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 2 66696.39 0.0000 MODEL MINUS INTERCEPT 1 40.14 0.0000 INTERCEPT . . . SRSEX 1 40.14 0.0000 NEWVAR1 . . . -------------------------------------------------------
Let's recode newvar1 into a 0/1 variable called newvar2.
data temp01a; set temp01; if newvar1 = 0 then newvar2 = 1; if newvar1 = 1 then newvar2 = 2; run; proc regress data=temp01a filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ab1 = srsex newvar2 ; subgroup srsex newvar2; levels 2 2; run;DESIGN SUMMARY: Variances will be computed using the Replicate Weight Jackknife (JACKKNIFE) Method Sample Weight: RAKEDW0 Replicate Sample Weights: RAKEDW1 RAKEDW2 RAKEDW3 RAKEDW4 RAKEDW5 RAKEDW6 RAKEDW7 RAKEDW8 RAKEDW9 RAKEDW10 RAKEDW11 RAKEDW12 RAKEDW13 RAKEDW14 RAKEDW15 RAKEDW16 RAKEDW17 RAKEDW18 RAKEDW19 RAKEDW20 RAKEDW21 RAKEDW22 RAKEDW23 RAKEDW24 RAKEDW25 RAKEDW26 RAKEDW27 RAKEDW28 RAKEDW29 RAKEDW30 RAKEDW31 RAKEDW32 RAKEDW33 RAKEDW34 RAKEDW35 RAKEDW36 RAKEDW37 RAKEDW38 RAKEDW39 RAKEDW40 RAKEDW41 RAKEDW42 RAKEDW43 RAKEDW44 RAKEDW45 RAKEDW46 RAKEDW47 RAKEDW48 RAKEDW49 RAKEDW50 RAKEDW51 RAKEDW52 RAKEDW53 RAKEDW54 RAKEDW55 RAKEDW56 RAKEDW57 RAKEDW58 RAKEDW59 RAKEDW60 RAKEDW61 RAKEDW62 RAKEDW63 RAKEDW64 RAKEDW65 RAKEDW66 RAKEDW67 RAKEDW68 RAKEDW69 RAKEDW70 RAKEDW71 RAKEDW72 RAKEDW73 RAKEDW74 RAKEDW75 RAKEDW76 RAKEDW77 RAKEDW78 RAKEDW79 RAKEDW80 Multiplier Associated with Replicate Weights: 1 Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 55428 Weighted count: 23847415 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 3 Weighted mean response is 2.494799 Multiple R-Square for the dependent variable AB1: 0.001206Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AB1: General health condition by: Independent Variables and Effects. ------------------------------------------------------------------------------------------------ Independent P-value Variables and Beta Lower 95% Upper 95% T-Test Effects Coeff. SE Beta Limit Beta Limit Beta T-Test B=0 B=0 ------------------------------------------------------------------------------------------------ Intercept 2.53 0.01 2.52 2.55 300.21 0.0000 Self-reported gender 1 -0.08 0.01 -0.10 -0.05 -6.25 0.0000 2 0.00 0.00 0.00 0.00 . . NEWVAR2 1 -0.02 0.02 -0.07 0.03 -0.91 0.3666 2 0.00 0.00 0.00 0.00 . . ------------------------------------------------------------------------------------------------------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 3 54078.77 0.0000 MODEL MINUS INTERCEPT 2 19.81 0.0000 INTERCEPT . . . SRSEX 1 39.06 0.0000 NEWVAR2 1 0.82 0.3666 -------------------------------------------------------
As you can see, all of the observations are used in this analysis.
If you are using SUDAAN 9 or later, you can use the class statement instead of recoding the 1/2 variable. As you can see, you get the same results as above.
proc regress data=temp01 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; class srsex newvar1; model ab1 = srsex newvar1 ; run;DESIGN SUMMARY: Variances will be computed using the Replicate Weight Jackknife (JACKKNIFE) Method Sample Weight: RAKEDW0 Replicate Sample Weights: RAKEDW1 RAKEDW2 RAKEDW3 RAKEDW4 RAKEDW5 RAKEDW6 RAKEDW7 RAKEDW8 RAKEDW9 RAKEDW10 RAKEDW11 RAKEDW12 RAKEDW13 RAKEDW14 RAKEDW15 RAKEDW16 RAKEDW17 RAKEDW18 RAKEDW19 RAKEDW20 RAKEDW21 RAKEDW22 RAKEDW23 RAKEDW24 RAKEDW25 RAKEDW26 RAKEDW27 RAKEDW28 RAKEDW29 RAKEDW30 RAKEDW31 RAKEDW32 RAKEDW33 RAKEDW34 RAKEDW35 RAKEDW36 RAKEDW37 RAKEDW38 RAKEDW39 RAKEDW40 RAKEDW41 RAKEDW42 RAKEDW43 RAKEDW44 RAKEDW45 RAKEDW46 RAKEDW47 RAKEDW48 RAKEDW49 RAKEDW50 RAKEDW51 RAKEDW52 RAKEDW53 RAKEDW54 RAKEDW55 RAKEDW56 RAKEDW57 RAKEDW58 RAKEDW59 RAKEDW60 RAKEDW61 RAKEDW62 RAKEDW63 RAKEDW64 RAKEDW65 RAKEDW66 RAKEDW67 RAKEDW68 RAKEDW69 RAKEDW70 RAKEDW71 RAKEDW72 RAKEDW73 RAKEDW74 RAKEDW75 RAKEDW76 RAKEDW77 RAKEDW78 RAKEDW79 RAKEDW80 Multiplier Associated with Replicate Weights: 1 Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 55428 Weighted count: 23847415 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 3 Weighted mean response is 2.494799 Multiple R-Square for the dependent variable AB1: 0.001206Frequencies and Values for CLASS Variables by: Self-reported gender. ---------------------------------- Self- reported gender Frequency Value ---------------------------------- Ordered Position: 1 23002 1 Ordered Position: 2 32426 2 ----------------------------------Frequencies and Values for CLASS Variables by: NEWVAR1. ---------------------------------- NEWVAR1 Frequency Value ---------------------------------- Ordered Position: 1 4049 0 Ordered Position: 2 51379 1 ----------------------------------Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AB1: General health condition by: Independent Variables and Effects. ------------------------------------------------------------------------------------------------ Independent P-value Variables and Beta Lower 95% Upper 95% T-Test Effects Coeff. SE Beta Limit Beta Limit Beta T-Test B=0 B=0 ------------------------------------------------------------------------------------------------ Intercept 2.53 0.01 2.52 2.55 300.21 0.0000 Self-reported gender 1 -0.08 0.01 -0.10 -0.05 -6.25 0.0000 2 0.00 0.00 0.00 0.00 . . NEWVAR1 0 -0.02 0.02 -0.07 0.03 -0.91 0.3666 1 0.00 0.00 0.00 0.00 . . ------------------------------------------------------------------------------------------------------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 3 54078.77 0.0000 MODEL MINUS INTERCEPT 2 19.81 0.0000 INTERCEPT . . . SRSEX 1 39.06 0.0000 NEWVAR1 1 0.82 0.3666 -------------------------------------------------------
In this example we have a 0/1 variable (newvar1), and we are not using it on the subgroup statement. If you want to have the table broken out by the values of newvar1, then you need to recode it to be a 1/2 variable and include it on the subgroup statement and include the number of levels on the levels statement. Note that the recode statement in SUDAAN will create a variable with the first category coded as 0, which means that you can not use this variable on the subgroup statement.
An alternative is to use the catlevel statement in proc descript. You will need to list the variable(s) on the var statement as many times as the number of levels for it that you have on the catlevel statement. The output below gives the total, percent and standard error of the percent for each of the levels of the recoded variable ab23.proc descript data=temp01 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; var srsex racehpra newvar1; subgroup srsex racehpra; levels 2 2; run;Number of observations read : 55428 Weighted count : 23847415 Denominator degrees of freedom : 80 Variance Estimation Method: Replicate Weight Jackknife by: Variable, Self-reported gender. ----------------------------------------------------------------------------------- | | | | Variable | | Self-reported gender | | | Total | MALE | FEMALE | ----------------------------------------------------------------------------------- | | | | | | | Self-reported | Sample Size | 55428 | 23002 | 32426 | | gender | Weighted Size | 23847415.32 | 11631728.37 | 12215686.95 | | | Total | 36063102.27 | 11631728.37 | 24431373.90 | | | Mean | 1.51 | 1.00 | 2.00 | | | SE Mean | 0.00 | 0.00 | 0.00 | ----------------------------------------------------------------------------------- | | | | | | | Race - UCLA | Sample Size | 9677 | 4084 | 5593 | | CHPR Definition | Weighted Size | 5705917.88 | 2866894.01 | 2839023.87 | | | Total | 5767889.98 | 2897175.85 | 2870714.13 | | | Mean | 1.01 | 1.01 | 1.01 | | | SE Mean | 0.00 | 0.00 | 0.00 | ----------------------------------------------------------------------------------- | | | | | | | NEWVAR1 | Sample Size | 55428 | 23002 | 32426 | | | Weighted Size | 23847415.32 | 11631728.37 | 12215686.95 | | | Total | 22084052.10 | 10772176.06 | 11311876.04 | | | Mean | 0.93 | 0.93 | 0.93 | | | SE Mean | 0.00 | 0.00 | 0.00 | ----------------------------------------------------------------------------------- ----------------------------------------------------------------------------------- | | | | Variable | | Race - UCLA CHPR Definition | | | Total | LATINO | PACIFIC | | | | | | ISLANDER | ----------------------------------------------------------------------------------- | | | | | | | Self-reported | Sample Size | 9677 | 9458 | 219 | | gender | Weighted Size | 5705917.88 | 5643945.79 | 61972.10 | | | Total | 8544941.75 | 8451279.40 | 93662.35 | | | Mean | 1.50 | 1.50 | 1.51 | | | SE Mean | 0.01 | 0.01 | 0.04 | ----------------------------------------------------------------------------------- | | | | | | | Race - UCLA | Sample Size | 9677 | 9458 | 219 | | CHPR Definition | Weighted Size | 5705917.88 | 5643945.79 | 61972.10 | | | Total | 5767889.98 | 5643945.79 | 123944.19 | | | Mean | 1.01 | 1.00 | 2.00 | | | SE Mean | 0.00 | 0.00 | 0.00 | ----------------------------------------------------------------------------------- | | | | | | | NEWVAR1 | Sample Size | 9677 | 9458 | 219 | | | Weighted Size | 5705917.88 | 5643945.79 | 61972.10 | | | Total | 5275702.50 | 5218781.91 | 56920.60 | | | Mean | 0.92 | 0.92 | 0.92 | | | SE Mean | 0.00 | 0.00 | 0.03 | -----------------------------------------------------------------------------------
proc descript data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; recode ab23 = (50); var ab23 ab23; catlevel 0 1; run;Number of observations read : 55428 Weighted count : 23847415 Denominator degrees of freedom : 80Variance Estimation Method: Replicate Weight Jackknife by: Variable, One. ----------------------------------------------------- | | | | Variable | | One | | | 1 | ----------------------------------------------------- | | | | | AB23: 0 - HIGH | Sample Size | 3709 | | | Weighted Size | 1380250.55 | | | Total | 709750.87 | | | Percent | 51.42 | | | SE Percent | 1.03 | ----------------------------------------------------- | | | | | AB23: 0 - HIGH | Sample Size | 3709 | | | Weighted Size | 1380250.55 | | | Total | 670499.68 | | | Percent | 48.58 | | | SE Percent | 1.03 | -----------------------------------------------------
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services