We will illustrate this using the hsb2 dataset pretending that the variable socst is the sampling weight (pweight) and that the sample is stratified on ses. Let's say that we wish to do a t-test for write by gender. In our dataset, the variable female is coded 1 for females and 0 for males.
use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear
svyset [pw=socst], strata(ses)
pweight: socst
VCE: linearized
Strata 1: ses
SU 1:
FPC 1:
First, we use the svy: mean command with the over option to get the means for each gender. Next, we use the test command to test the null hypothesis that these two means are equal.
svy: mean write, over(female)
(running mean on estimation sample)
Survey: Mean estimation
Number of strata = 3 Number of obs = 200
Number of PSUs = 200 Population size = 10481
Design df = 197
male: female = male
female: female = female
--------------------------------------------------------------
| Linearized
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
write |
male | 51.65351 1.041066 49.60045 53.70658
female | 55.81467 .721354 54.3921 57.23723
--------------------------------------------------------------
test [write]male = [write]female
Adjusted Wald test
( 1) [write]male - [write]female = 0
F( 1, 197) = 10.45
Prob > F = 0.0014
We can see from the output above that the means are not statistically equivalent.
We could also use the lincom command to test the two means. This command should be run after the svy: means command shown above. The lincom command gives us the difference between the means (51.65351 - 55.81467 = -4.161156), the standard error of the difference, as well as the t-value and the p-value. Notice that the p-value is the same as above, and that squaring the t-value yields the F-value shown above ( (-3.23)^2 = 10.45).
lincom [write]male - [write]female
( 1) [write]male - [write]female = 0
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -4.161156 1.2871 -3.23 0.001 -6.699419 -1.622892
------------------------------------------------------------------------------
The svy: regress command can also be used to compute the t-test. To do this, simply include the single dichotomous predictor variable. The coefficient for female is the t-test. As you can see, you get the same coefficient and p-value that we did when we used the lincom command. The sign of the coefficient is different because above, the mean of the males was subtracted from the mean of females. Below, the mean of females was subtracted from the mean of the males.
svy: regress write female
(running regress on estimation sample)
Survey: Linear regression
Number of strata = 3 Number of obs = 200
Number of PSUs = 200 Population size = 10481
Design df = 197
F( 1, 197) = 10.45
Prob > F = 0.0014
R-squared = 0.0519
------------------------------------------------------------------------------
| Linearized
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 4.161156 1.2871 3.23 0.001 1.622892 6.699419
_cons | 51.65351 1.041066 49.62 0.000 49.60045 53.70658
------------------------------------------------------------------------------
We can use the test command after the svy: regress if we would like to get the F-ratio.
test female
Adjusted Wald test
( 1) female = 0
F( 1, 197) = 10.45
Prob > F = 0.0014
Regardless of the method that we use, we obtain an F-ratio of 10.45 or a t-value
of 3.23 with a p-value of 0.0014.Note: This FAQ was inspired by several responses to a question on the Statalist.
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.