Statistical Computing Workshop
Problem Solving in Stata 12

Stata 12 was released July 25, 2011. This workshop will show some of the kinds of data analysis problems that you can solve using the new features of Stata 12. The examples in this workshop are presented as a series of commands without output. This was done because of the extreme amount of output for this workshop.

Let's get started.

1) Problem: Reading data from Excel spreadsheet.

Uses a local file named feb2008.xls.

. import excel using feb2008.xls, first cellrange(c4) clear
You can also export to Excel.

2) Problem: Need to allocate more memory to Stata.

User would get an out of memory error or no room to add observations error and would be asked to allocate more memory to Stata.

This is no longer a problem. Stata 12 has added automatic memory management. You will never have to use the set memory command again.

3) Problem: Regression with missing data
choices:

Load data, get descriptives and generate dummy variables.
. use http://www.ats.ucla.edu/stat/data/hsbmar, clear
. summarize
. tab prog, gen(prog)
Complete case analysis for comparison.
. regress write female read math science socst prog2 prog3
Missing data using fiml with the sem command.
. sem (write <- female read math science socst prog2 prog3), method(mlmv)
This time the analysis uses multiple imputation with chained equations.
Note: Stata 11 had multiple imputation with monotone and multivariate normal.
. mi set mlong
. mi register impute female read math science socst
. mi impute chained (regress) read math science socst ///
                    (logit)   female = write i.ses i.prog, add(5)
. mi estimate: regress write read math science socst i.female 
4) Problem: Complex mediation analysis with categorical independent variable and categorical mediator

. use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

. tab ses, gen(ses)
. tab prog, gen(prog)

. sem (prog2<-ses2 ses3)(prog3<-ses2 ses3)(write<-prog2 prog3 ses2 ses3)

. estat teffects  // get direct & indirect effect
5) Problem: Mediation model with bootstrap standard errors.

. sem (prog2 <- ses2 ses3)(prog3 <- ses2 ses3)(write <-  prog2 prog3 ses2 ses3), ///
      vce(bootstrap, reps(100))

. estat bootstrap, percentile
6) Problem: Path analysis.

First a saturated model.

. sem (write <- ses female)(read <- ses female)(math <- write read ses female) ///
      (science <- math write read ses female), cov(ses*female e.write*e.read)
. sem, standardized
And next a reduced model.
. sem (write <- ses female)(read <- ses)(math <- write read female) ///
      (science <- math write read female), cov(ses*female e.write*e.read)

. sem, standardized

. estat gof, stats(all)

. estat teffects
7) Problem: Structural equation model from a journal article

Stata 12 has added Summary Statistics Data (SSD) type that allows you to enter N, means and covariance instead of the complete raw data. This example is from Wheaton, B., Muthen B., Alwin, D., & Summers, G., 1977.

. use http://www.stata-press.com/data/r12/sem_sm2, clear

. ssd describe
. notes

. sem ///
   (anomia67 pwless67 <- Alien67) /// measurement piece
   (anomia71 pwless71 <- Alien71) /// measurement piece
   (Alien67 <- SES)               /// structural piece
   (Alien71 <- Alien67 SES)       /// structural piece
   ( SES -> educ occstat66),      /// measurement piece
   cov(e.anomia67*e.anomia71) cov(e.pwless67*e.pwless71)

. estat framework  // traditional sem notation
8) Problem: Categorical by categorical interactions.

This is from a 3x3 completely randomized factorial design. We will use regression instead of anova.

. use http://www.ats.ucla.edu/stat/data/crf33, clear

. regress y a##b

. contrast a##b

. margins a#b  // get cell means

. marginsplot  // plot interaction

. contrast a@b  // get simple effects
9) Problem: Understanding 3-way interaction.

A 2x2x3 completely randomized factorial design.

. use http://www.ats.ucla.edu/stat/data/3way, clear

. anova y a##b##c

. margins a#b#c  // cell means

. marginsplot, by(a) x(c)  // plot cell means

. contrast b#c@a  // simple effects at a

. contrast b@c@a  // ignore results for a=2 
10) Problem: Graphing categorical by continuous interactions.
. use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

. regress write female##c.socst

. margins female, dydx(socst)  // compute slopes for males and females

. margins female, at(socst=(30(5)70)) noatlegend  // compute points for graphing

. marginsplot, recast(line) noci name(cc1, replace)  // graph lines for males and females

. marginsplot, recast(line) recastci(rarea) ciopts(color(gs10)) ///
               name(cc2, replace)  //  improved plot

. marginsplot, recast(line) noci name(cc3, replace) ///
               addplot(scatter write socst, msym(oh) jitter(3))  // add scatterplot

/* difference between males and females  */
/* regions of statistical significance   */

. margins, dydx(female) at(socst=(30(5)70)) noatlegend  // get points for graphing

. marginsplot, recast(line) recastci(rarea) ciopts(color(gs10)) ///
               yline(0) name(vv4, replace)
11) Problem: Graphing continuous by continuous interactions.
. use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

. regress read c.math##c.socst write

. margins, dydx(math) at(socst=(30(10)70)) vsquish  // simple slopes for math

. margins, at(math=(30(5)75) socst=(30(10)70)) asbalanced noatlegend  // compute points for graphing

. marginsplot, x(math) recast(line) noci name(cc5, replace) ///
    addplot(scatter read math, msym(oh) jitter(3)) // plot simple slopes with scatterplot
12) Problem: Compute all possible pairwise comparisons.

This is a 2x4 completely randomized factorial analyzed using anova.

. use http://www.ats.ucla.edu/stat/data/crf24, clear

. anova y a##b

. pwcompare b, mcompare(tukey)

. pwcompare a#b, mcompare(tukey)

. pwcompare b, mcompare(dunnett)  // all pairwise versus control group
13) Problem: Graphing contourplots.
. use http://www.ats.ucla.edu/stat/data/sandstone, clear

. twoway contour depth northing easting

. twoway contourline depth northing easting

14) Problem: Estimating truncated count models.

For example, modeling length of hospital stay where the minimum stay is one day. Stata 11 had zero-truncated count models but the new tpoisson and tnbreg allow you to truncate to any value not just zero. .

. use http://www.ats.ucla.edu/stat/data/medpar, clear

. tpoisson los died hmo type2 type3, ll(0) nolog cluster(provnum) 

. tnbreg   los died hmo type2 type3, ll(0) nolog cluster(provnum) 
That's all for today.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.