UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Class Notes 3.0
Analyzing Data


1.0 Stata commands in this unit

ttestt-test
regressRegression
predictPredicts after model estimation
kdensityKernel density estimates and graphs
pnormGraphs a standardized normal plot
qnormGraphs a quantile plot
rvfplotGraphs a residual versus fitted plot
rvpplotGraphs a residual versus individual predictor plot
xiCreates dummy variables during model estimation
testTest linear hypotheses after model estimation
onewayOne-way analysis of variance
anovaAnalysis of variance
logisticLogistic regression
logitLogistic regression
signtestTests the equality of matched pairs of data
signrankWilcoxon matched-pairs signed rank test
ranksumMann-Whitney two-sample test
medianNon-parametric K-sample test of equal medians
kwallisNonparametric analog to the one-way anova

2.0 Demonstration and Explanation

use hs1, clear

2.1 t-tests

This is the one-sample t-test, testing whether the sample of writing scores was drawn from a population with a mean of 50.
ttest write = 50
This is the paired t-test, testing whether or not the mean of write equals the mean of read.
ttest write = read
This is the two-sample independent t-test with pooled (equal) variances.
ttest write, by(female)
This is the two-sample independent t-test with separate (unequal) variances.
ttest write, by(female) unequal

2.2 Analysis of Variance

Both of these commands perform a one-way analysis of variance (ANOVA).
oneway write prog
anova write prog 
In this example the anova command is used to perform a two-way analysis of variance (ANOVA).
anova write prog female prog*female
Here the anova command performs an analysis of covariance (ANCOVA).
anova write prog female prog*female read, cont(read)

2.3 Regression

Plain vanilla OLS regression.
regress write read female
In the example below, we run the regression with robust standard errors. This is very useful when there is heterogeneity of variance. This option does not affect the estimates of the regression coefficients.
regress write read female, robust
The predict command calculates predictions, residuals, influence statistics, and the like after an estimation command. The default shown here is to calculate the predicted scores.
predict p
When using the resid option the predict command calculates the residual.
predict r, resid
The list command displays the values of the variables that we have generated. The in 1/20 option stipulates that only the first 20 observations be displayed.
list math p r in 1/20
The kdensity command with the normal option displays a density graph of the residuals with an normal distribution superimposed on the graph. This is particularly useful in verifying that the residuals are normally distributed, which is a very important assumption for regression.
kdensity r, normal
The pnorm command produces a normal probability plot and it is another method of testing wether the residuals from the regression are normally distributed.
pnorm r
The qnorm command produces a normal quantile plot. It is yet another method for testing if the residuals are normally distributed. The qnorm plot is more sensitive to deviances from normality in the tails of the distribution, whereas the pnorm plot is more sensitive to deviances near the mean of the distribution.
qnorm r
rvfplot is a convenience command that generates a plot of the residual versus the fitted values; it is used after regress or anova.
rvfplot
rvpplot is another convenience command which produces a plot of the residual versus a specified predictor and it is also used after regress or anova.
rvpplot read
Creating dummy variables by using the xi command

The xi prefix is use to dummy code categorical variables such as prog. The predictor prog has three levels and requires two dummy-coded variables. The test command is used to test the collective effect of the two dummy-coded variables; in other words, it tests the main effect of prog.

xi: regress write read i.prog
test _Iprog_2 _Iprog_3
The xi prefix can also be used to create dummy variables for prog and for the interaction of prog and read. The first test command tests the overall interaction and the second test command tests the main effect of prog.
xi: regress write i.prog*read 
test _IproXread_2 _IproXread_3
test _Iprog_2 _Iprog_3

2.4 Logistic regression

In order to demonstrate the logistic regression commands, we will create a dichotomous variable called honcomp (honors composition) to use as our dependent variable. This is purely for illustrative purposes only!
gen honcomp = write >= 60
tab honcomp
The logistic command defaults to producing the output in odds ratios but can display the coefficients if the coef option is used. The exact same results can be obtained by using the logit command, which produces coefficients as the default but will display the odds ratio if the or option is used.
logit honcomp read female
logit honcomp read female, or

2.5 Non-Parametric Tests

The signtest is the nonparametric analog of the single-sample t-test.
signtest write = 50
The signrank command computes a Wilcoxon sign-ranked test, the nonparametric analog of the paired t-test.
signrank write = read
The ranksum test is the nonparametric analog of the independent two-sample t-test and is know as the Mann-Whitney or Wilcoxon test.
ranksum write, by(female)
The kwallis command computes a Kruskal-Wallis test, the non-parametric analog of the one-way ANOVA.
kwallis write, by(prog)

3.0 For More Information


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California