UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

Stata Code Fragment
Running a simulation

Below are two examples of running simulations using Stata. Both examples involve running a regression. The difference between them is the way the data for the regression are generated. The simulation command repeats this 1000 times and records the coefficient estimates and their standard errors from each repetition.

In the first example, the two independent variables are from an existing dataset and the dependent variable is generated based on the two independent variables plus some random error. The dependent variable is then regressed on the two independent variables.

* Set up the steps you want to repeat for the simulation in a program
program define myprog1
	* drop all variables to create an empty dataset, do not use clear
	drop _all
	* get dataset
	use http://www.ats.ucla.edu/stat/stata/faq/hsb2
	* keep the independent variables (IVs)
	keep write math
	* gen dependent variable (DV) with set relationship to IVs + random error
	gen y = 7.541 + .3283*math + .5196*write + 7.281 * invnormal(uniform())
	* run the desired command
	reg y write math
end

* use the simulate command to rerun myprog1 1000 times
* collect the betas (_b) and standard errors (_se) from the regression each time
* You'll probably want to set reps(10) for testing, then set it higher for the simulation.
simulate _b _se, reps(1000): myprog1

The second example is similar to the first, except that the data are random draws from a normal distribution with a given correlational structure using the command drawnorm. Covariances can also be used by specifying the cov() option instead of corr(). If no correlation or covariance structure is specified, the variables generated will be orthogonal. The code below also specifies means and standard deviations for the variables, but this is not strictly necessary.

* Set up the steps you want to repeat for the simulation in a program
program define myprog2
	* drop all variables to create an empty dataset, do not use clear
	drop _all
	* create a vector that contains the equivalent of a lower triangular correlation matrix
	matrix c = (1, 0.5968, 1, 0.6623, 0.6174, 1)
	* create a vector that contains the means of the variables
	matrix m = (52.23,52.775,52.645)
	* create a vector that contains the standard deviations
	matrix sd = (10.25,9.47,9.36) 
	* draw a sample of 1000 cases from a normal distribution with specified correlation structure
	* and specified means and standard deviations
	drawnorm x1 x2 y, n(1000) corr(c) cstorage(lower) means(m) sds(sd)
	* run the desired command
	reg y x1 x2
end
* use the simulate command to rerun myprog2 1000 times
* collect the betas (_b) and standard errors (_se) from the regression each time
* You'll probably want to set reps(10) for testing, then set it higher for the simulation.
simulate _b _se, reps(1000): myprog2

How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California