UCLA Academic Technology Services HomeServicesClassesContactJobs

Statistical Computing Seminars
Introduction to Stata Programming

Many researchers use Stata without ever writing a program even though programming could make them more efficient in their data analysis projects. Stata programming is not difficult since it mainly involves the use of Stata commands to you already use. The trick to Stata programming is to use the appropriate commands in the right sequence. Of course, this is the trick to any kind of programming.

There are two kinds of files that are used in Stata programming, do-files and ado-files. Do-files are run from the command line using the do command, for example,

Ado-files, on the other hand, work like ordinary Stata commands by just using the file name in the command line, for example, In fact, many of the built-in Stata commands are just ado-files. You can look at the source code for the ado commands using the viewsource command, for example,

Do-files can be placed in the same folder as the data but ado-files need to go where Stata can find them. The best place for user written ado-files is in the /ado/personal/ directory. The location of this directory can vary for system to system.

We will try to give users a feel for Stata programming by covering the following topics:

  1. Creating and using do-files for checking and cleaning data.
  2. Using do-files for analyzing data.
  3. Writing an ado program to create a statistical command.
  4. Creating an ado-file that uses the Stata matrix operations for performing an analysis.

Part 1: Creating and using do-files for checking and cleaning data

We will create a file, hsbcheck.do, that contains commands that will display observations with incorrect or impossible values. Here is how our do-file is used with the dataset hsberr. So how did the hsbcheck program "know" which file to use? This was done using a macro variable, in this case, `1', which takes the first term typed after the name f\of the program and treats it as as file name. Macro variables have many uses including as variable names or numeric values. We will see additional uses of macro variables in other programs.

Now that we know what errors there are in the data we can write a do-file that will fix the errors. When we know the correct value of an observation, we will replace the incorrect value with the correct one. When we do not know the correct value for an observation, we will replace the incorrect value with missing. The do-file hsbfix.do will read in hsberr, correct the errors and save the corrected file as hsbclean. Here is what hsbfix.do looks like.

One important thing to note is that after we fix the incorrect values, we will save the the data file with a new name. We will never change any of the values in the original data file, hsberr.

First, we will run hsbfix on the original file hsberr then, as a check, we will run hsbcheck on the new file hsbclean.

Part 2: Using do-files for analyzing data

Next, we will create a do-file that contains all of the commands that we need to run our data analysis. This do-file will be called hsbanalyze.do. Now, let's use hsbanalyze with our data file hsbclean. This may not seem all that useful; after all, you could just as easily type each of the commands into the command window, but what if your coauthor comes to you and says, "we need to redo the whole analysis using only schtyp equal to one." Here's all you have to do.

Part 3: Writing an ado program to create a statistical command

Now, let's try our hand at writing a statistical command. Ado programs are very much like do-file programs with the advantage that you just have to type the name of the command. You will need to include two additional commands to create an ado program. You begin with program define and the name of the command and you end with an end command. Also, you need to save the file as a .ado using the same name for the file as the name of the new command.

We will illustrate the ado program by writing a command that computes the median. Of course, Stata already have commands that compute medians but we are doing this to illustrate the process of creating an ado program.

The basic logic of computing the median is to sort the variable of interest then, if there are an odd number of values take the middle one, an if there are an even number of values take half the distance between the middle two. Below is the first version of our program which is saved in the file names median1.ado.

Let's try median1 on the hsbclean dataset. This program worked just fine but it could be improved. We will modify the program to improve the output format and to allow for multiple variables. We will call this new program median2 which will be saved in the file median2.ado. Here is how median2 works. Again, everything seems to be working fine but it would be better if the program allowed the use of if or in to subset the data. For example, what if we wanted the medians just for males. The program median3 will allow the user to use if and in. Here is how this version of the program works. The rclass option in median2 and median3 allow you to temporarily store the results from a program. For our program, we keep the frequency and median for the last variable used in the command. Here is how you can view the values being stored. Stata has included some tools to make creating and debugging program as bit easier. One of them is the trace option that shows you command by command what is happening inside your program. Let's run median3 with trace turned on and see what it looks like.

Part 4: Creating an ado-file that uses the Stata matrix operations for performing an analysis

Many statistical procedures are done more easily using matrix commands. Stata has two different ways that you you can use matrix commands in your programs. First, Stata has a fairly complete set of matrix commands build right into Stata itself. Second, Stata has a complete matrix programming language called Mata. Mata is faster and more powerful than Stata's built-in matrix commands but the built-in commands are easier to program for small to medium programming projects. We will illustrate Stata's built-in matrix commands by writing an OLS regression program.

We will begin with matreg1.ado that makes use of the famous matrix equation for the regression coefficients, b=(X'X)-1X'Y.

Here is an example of how to use matreg1. In matreg1 we computed t-tests and p-values manually. We also managed all of the output display. We can let Stata do all of that for us by using the estimates post command which will greatly simplify the program. We will do this in matreg2 as shown below.