Stata Class Notes
Analyzing Data


1.0 Stata commands in this unit

ttestt-test
anovaAnalysis of variance
xiCreates dummy variables during model estimation
regressRegression
      predictPredicts after model estimation
      kdensityKernel density estimates and graphs
      pnormGraphs a standardized normal plot
      qnormGraphs a quantile plot
      rvfplotGraphs a residual versus fitted plot
      testTest linear hypotheses after model estimation
logitLogistic regression
tabulateCrosstabs with chi-square test
signtestTests the equality of matched pairs of data
signrankWilcoxon matched-pairs signed rank test
ranksumMann-Whitney two-sample test
kwallisNonparametric analog to the one-way anova

2.0 Demonstration and explanation

use hs1, clear

2.1 chi-square test of frequencies

Here is the tabulate command for a crosstabulation with an option to compute chi-square test of independence and measures of association.
tabulate prgtype ses, all
Here is the command with an option to display expected frequencies so that one can check for cells with very small expected values.
tabulate prgtype ses, all expected

2.2 t-tests

This is the one-sample t-test, testing whether the sample of writing scores was drawn from a population with a mean of 50.
ttest write = 50
This is the paired t-test, testing whether or not the mean of write equals the mean of read.
ttest write = read
This is the two-sample independent t-test with pooled (equal) variances.
ttest write, by(female)
This is the two-sample independent t-test with separate (unequal) variances.
ttest write, by(female) unequal

2.3 Analysis of Variance

The anova command, unsurprisingly, performs analysis of variance (ANOVA). Here is an examplr of a one-way analysis of variance.
anova write prog 
In this example the anova command is used to perform a two-way factorial analysis of variance (ANOVA).
anova write prog female prog*female
Here is an example of an analysis of covariance (ANCOVA) using the anova command.
anova write prog female prog*female read, continuous(read)

2.4 Regression

Plain vanilla OLS linear regression.
regress write read female
In the example below, we run the regression with robust standard errors. This is very useful when there is heterogeneity of variance. This option does not affect the estimates of the regression coefficients.
regress write read female, robust
The predict command calculates predictions, residuals, influence statistics, and the like after an estimation command. The default shown here is to calculate the predicted scores.
predict p
When using the resid option the predict command calculates the residual.
predict r, resid
The list command displays the values of the variables that we have generated. The in 1/20 option stipulates that only the first 20 observations be displayed.
list math p r in 1/20
The kdensity command with the normal option displays a density graph of the residuals with an normal distribution superimposed on the graph. This is particularly useful in verifying that the residuals are normally distributed, which is a very important assumption for regression.
kdensity r, normal
The pnorm command produces a normal probability plot and it is another method of testing wether the residuals from the regression are normally distributed.
pnorm r
The qnorm command produces a normal quantile plot. It is yet another method for testing if the residuals are normally distributed. The qnorm plot is more sensitive to deviances from normality in the tails of the distribution, whereas the pnorm plot is more sensitive to deviances near the mean of the distribution.
qnorm r
rvfplot is a convenience command that generates a plot of the residual versus the fitted values; it is used after regress or anova.
rvfplot
Creating dummy variables by using the xi command

The xi prefix is use to dummy code categorical variables such as prog. The predictor prog has three levels and requires two dummy-coded variables. The test command is used to test the collective effect of the two dummy-coded variables; in other words, it tests the main effect of prog.

xi: regress write read i.prog
describe _I*
test _Iprog_2 _Iprog_3
The xi prefix can also be used to create dummy variables for prog and for the interaction of prog and read. The first test command tests the overall interaction and the second test command tests the main effect of prog.
xi: regress write i.prog*read 
describe _I*
test _IproXread_2 _IproXread_3
test _Iprog_2 _Iprog_3

2.5 Logistic regression

In order to demonstrate the logistic regression commands, we will create a dichotomous variable called honcomp (honors composition) to use as our dependent variable. This is purely for illustrative purposes only!
gen honcomp = write >= 60
tab honcomp
The logistic command defaults to producing the output in odds ratios but can display the coefficients if the coef option is used. The exact same results can be obtained by using the logit command, which produces coefficients as the default but will display the odds ratio if the or option is used.
logit honcomp read female
logit, or

2.6 Non-Parametric Tests

The signtest is the nonparametric analog of the single-sample t-test.
signtest write = 50
The signrank command computes a Wilcoxon sign-ranked test, the nonparametric analog of the paired t-test.
signrank write = read
The ranksum test is the nonparametric analog of the independent two-sample t-test and is know as the Mann-Whitney or Wilcoxon test.
ranksum write, by(female)
The kwallis command computes a Kruskal-Wallis test, the non-parametric analog of the one-way ANOVA.
kwallis write, by(prog)

3.0 For more information

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.