Help the Stat Consulting Group by giving a gift

How can I access information stored after I run a command in Stata (returned results)?

In addition to the output in the shown in the results window, many of Stata's commands
store information about the command and it's results in memory. This allows the user,
as well as other Stata commands, to easily make use of this information. Stata calls
these returned results. Returned results can be very useful when you want to use
information produced by a Stata command to do something else in Stata. For example, if you
want to mean center a variable, you can use **summarize** to
calculate the mean, then use the value of the mean calculated by **summarize**
to center the variable. Using returned results will eliminate
the need to retype or cut and paste the value of the mean.
Another example of
how returned results can be useful is if you want to generate predicted values of the outcome
variable when the predictor variables are at a specific set of values, again
here, you could retype the coefficients or use cut and paste, but returned results
make the task much easier.

The best way to get a sense of how returned results work is to jump right in
and start looking at and using them. The code below opens an example dataset and
uses **summarize** (abbreviated **sum**) to generate descriptive statistics for the variable **read**. This produces the
expected output, but more importantly for our purposes, Stata now has results from the **
summarize** command stored in memory. But how do you know what information has
been stored? A listing of the information saved by each command is included in the help file and/or printed manual, so I could look
there, but I can also just type **return list**, which
will list all the returned results in memory.

use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear sum readVariable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- read | 200 52.23 10.25294 28 76return listscalars: r(N) = 200 r(sum_w) = 200 r(mean) = 52.23 r(Var) = 105.1227135678392 r(sd) = 10.25293682648241 r(min) = 28 r(max) = 76 r(sum) = 10446

Above is a list of the returned results, as you can see each result is of the
form **r(...)** where the ellipses ("...") is a short label. We could
see the help file for the **summarize** command to find out what each item on
the list is, but it is often easy to figure out what value is
assigned to what result, for example, **r(mean)**, not surprisingly contains the mean of
**read** (you can check
this against the output), but others are not as obvious, for example
**r(sum_w)**, for these, you may need to consult the manual if you think you
might want to use them. Most of the time the process will be relatively easy
because you'll know what
result you want to access, you will be looking at the list to find out what name it is stored under,
rather than looking at the list and trying to figure out what each item is.

As you might imagine, different commands, and even the same command with different options,
store different results. Below we **summarize **the variable read again, but add the **detail** option.
Then we use **return list** to get the list of returned results. Just as the
**detail** option adds additional information to the output, it also results in
additional information stored in the returned results. The new list includes all of the information
returned by the **sum** command
above, plus skewness; kurtosis; and a number of percentiles, including the 1st (
**r(p25)** )and 3rd
( **r(p75)** ) quartiles and the median ( **r(p50)** ).

sum read, detailreading score ------------------------------------------------------------- Percentiles Smallest 1% 32.5 28 5% 36 31 10% 39 34 Obs 200 25% 44 34 Sum of Wgt. 200 50% 50 Mean 52.23 Largest Std. Dev. 10.25294 75% 60 73 90% 67 73 Variance 105.1227 95% 68 76 Skewness .1948373 99% 74.5 76 Kurtosis 2.363052return listscalars: r(N) = 200 r(sum_w) = 200 r(mean) = 52.23 r(Var) = 105.1227135678392 r(sd) = 10.25293682648241 r(skewness) = .1948372909440272 r(kurtosis) = 2.363051990033788 r(sum) = 10446 r(min) = 28 r(max) = 76 r(p1) = 32.5 r(p5) = 36 r(p10) = 39 r(p25) = 44 r(p50) = 50 r(p75) = 60 r(p90) = 67 r(p95) = 68 r(p99) = 74.5

Now that we have some sense of what results are returned by the **summarize**
command, we can make use of the returned results. Following through with one of the
examples mentioned above, we will mean center the variable **read**. Assuming
that the last command we ran was the **summarize** command above, the code
below uses generates a new variable, ** c_read** that contains the mean centered
values of **read**. Notice that instead of using the actual value of the
mean of **read** in this command, we used the name of the returned result
(i.e. **r(mean)**),
Stata knows when it sees **r(mean)** that we actually mean the value stored in
that system variable. On the next line we **summarize** the new variable **
c_read**, while the mean is not exactly equal to zero, it is within rounding error of
zero, so we know that we have properly mean centered the variable **read**.

gen c_read = read - r(mean) sum c_readVariable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- c_read | 200 2.18e-07 10.25294 -24.23 23.77

As the code above suggests, we can use returned results pretty much the same way
we would use an actual
number. This is because Stata uses the **r(...)** as a placeholder for a real
value. For another example of this, say that we want to calculate the variance of **read**
from its
standard deviation (ignoring the fact that **summarize** returns the variance in **r(Var)**).
We can do this on the fly using the **display** command as a calculator. The second line of code below
does this. We can even
check the result by cutting and pasting the value of the standard deviation from
the output, which is done in the third command below. The results are basically
the same, the very slight difference is rounding error because the stored
estimate **r(sd)** contains more digits of accuracy than the value of the
standard deviation displayed in the output.

display r(sd)^2105.12271display 10.25294^2105.12278

Now that you know a little about returned results and how they work you are
ready for a little more information about them. Returned results come in two
main types, r-class, and e-class (there are also s-class
and c-class results/variables, but we will not discuss them here). Commands that perform
estimation, for example regressions of all types, factor analysis, and anova are
e-class commands. Other commands, for example summarize, correlate and post-estimation
commands, are r-class commands. The distinction between r-class and e-class commands is important because
Stata stores results from e-class and r-class commands in
different "places." This has two ramifications for you as a user.
First, you need to know whether results are stored in **r()** or **e()** (as well as the
name of the result) in order to make use of them. If you're not sure which class a
command you've run is in, you can either look it up in the help file, or "look"
in one place (using the appropriate command to list results), if the results are not
stored there they are probably in the other. A potentially more important
ramification of the difference in how results from r-class and e-class commands
are returned is that returned results are held in memory only until another
command of the same class is run. That is, returned results from previous commands are
replaced by subsequent commands of the same class. In contrast, running a command of
another class will not affect the returned results. For example, if I run a
regression, and then a second regression, the results of the first regression
(stored in **e()**) are replaced by those for the second regression (also
stored in **e()**) . However, if instead of a second regression, I ran a post-estimation command, the results from the regression would remain in
**e()**
while the results from the post estimation command would be placed in **r()**.

While there is a distinction between the two, the actual use of results from r-class
and e-class commands is very similar. For starters, the commands are parallel, to list
the r-class results stored in memory the command is **return list**, to do the
same for e-class results the command **ereturn list**. Further, except for
the difference in naming conventions (**r()** vs. **e()**), the results are accessed in the same way.
The example below demonstrates this, first we **regress** **write** on **female** and **read**, and then use **ereturn list** to look at
the returned results.

regress write female readSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 77.21 Model | 7856.32118 2 3928.16059 Prob > F = 0.0000 Residual | 10022.5538 197 50.8759077 R-squared = 0.4394 -------------+------------------------------ Adj R-squared = 0.4337 Total | 17878.875 199 89.843593 Root MSE = 7.1327 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 5.486894 1.014261 5.41 0.000 3.48669 7.487098 read | .5658869 .0493849 11.46 0.000 .468496 .6632778 _cons | 20.22837 2.713756 7.45 0.000 14.87663 25.58011 ------------------------------------------------------------------------------ereturn listscalars: e(N) = 200 e(df_m) = 2 e(df_r) = 197 e(F) = 77.21062421518363 e(r2) = .4394192130387506 e(rmse) = 7.132734938503835 e(mss) = 7856.321182518186 e(rss) = 10022.5538174818 e(r2_a) = .4337280375366059 e(ll) = -675.2152914029985 e(ll_0) = -733.0934827146213 macros: e(cmdline) : "regress write female read" e(title) : "Linear regression" e(vce) : "ols" e(depvar) : "write" e(cmd) : "regress" e(properties) : "b V" e(predict) : "regres_p" e(model) : "ols" e(estat_cmd) : "regress_estat" matrices: e(b) : 1 x 3 e(V) : 3 x 3 functions: e(sample)

The list of returned results for **regress** includes several types of returned results
listed under the headings
scalars, macros, matrices and functions. We will discuss the types of returned results below, but for now
we will show how you can use the scalar returned results the same way that we
used the returned results from **summarize**. For example, one way to calculate the variance of the errors
after a regression is to divide the residual sum of squares by the total degrees
of freedom (i.e. n-1). The residual sum of squares is stored in **e(rss)** and that the n
for the analysis is
stored in **e(N)**. Below we use the **display** command as a calculator, along with the
returned results to calculate the variance of the errors.

display e(rss)/(e(N)-1)50.364592

As mentioned above, for both r-class and e-class commands, there are multiple types of returned results, including scalars, strings, matrices, and functions. In the lists of returned results, each type is listed under its own heading. The results listed under the heading "scalars" are just that, a single numeric value. Their usage is discussed above, so we won't say anymore about them in this section.

Returned results listed under "macros" are generally strings
that give information about the command that was run. For example, in the
returned results of for the regression shown above, **e(cmd_line)**
contains the command the user issued (without any abbreviations). These are generally used in
programming Stata.

Results listed under "matrices" are, as you would expect, matrices. While
the list of results
returned by **return list** and **erturn list** show you the values taken on
by most of the returned results, this is not practical with matrices,
instead the dimensions of the matrices are listed. To see the contents of matrices you must
display them using matrix commands. We do this below with the matrix of
coefficients (**e(b)**) using the command **matrix list e(b)**. (Note
that there is another way to access coefficients and their standard errors after
you fit a model, this is discussed below.) If we would like to perform matrix
operations on returned matrices, or wish to access individual elements of the
matrix, we can move the matrix stored as a returned result to a normal Stata matrix.
This is done in the final line of syntax below.

matrix list e(b)e(b)[1,3] female read _cons y1 5.486894 .56588693 20.228368matrix b = e(b)

Finally, the results returned under the heading "functions" contain functions
that can be used in a manner similar to other Stata functions. The most common function
returned by Stata estimation commands is probably **e(sample)**. This function marks the
sample used in estimation of the last analysis, this is useful as datasets often
contain
missing values resulting in not all cases in the dataset being used in a given
analysis. Assuming that the last
estimation command run was the regression of **write** on **female** and
**read** shown
above, the first line of code below uses **e(sample)** to find the mean of **read** among those cases used in the model. The second line of code uses **e(sample)** to
create a new variable called **flag** which is equal to 1 for cases that were
used in the analysis, and zero otherwise. (Note since the example dataset contains no
missing data, all of the cases are included in the analysis, and **flag** is
a constant equal to one.)

sum read if e(sample)==1Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- read | 200 52.23 10.25294 28 76gen flag = e(sample)

As discussed above, after one fits a model, coefficients and their standard errors are stored
in **e()** in matrix form. These matrices allow the user access to the coefficients, but Stata
gives you an even easier way to access this information by storing it in the system variables
**_b** and **_se**. To access the value of a regression coefficient after a regression, all
one needs to do is type **_b[varname]** where **varname** is the name of the predictor variable whose coefficient you
want to examine. To access the standard error, you can simply type **_se[varname]**.
To access the coefficient and standard error of the constant we use **_b[_cons]**
and **_se[_cons]** respectively. Below we run the same regression model we
ran above (omitting the output), using **female** and **read** to predict **write**.
Once we have estimated the model, we use the **display** command to show
that the values in **_b** are equal to our regression coefficients. Finally,
we calculate the predicted value of **write**
when a female (**female**=1) student has a **read** score of 52.

regress write female readSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 77.21 Model | 7856.32118 2 3928.16059 Prob > F = 0.0000 Residual | 10022.5538 197 50.8759077 R-squared = 0.4394 -------------+------------------------------ Adj R-squared = 0.4337 Total | 17878.875 199 89.843593 Root MSE = 7.1327 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 5.486894 1.014261 5.41 0.000 3.48669 7.487098 read | .5658869 .0493849 11.46 0.000 .468496 .6632778 _cons | 20.22837 2.713756 7.45 0.000 14.87663 25.58011 ------------------------------------------------------------------------------display _b[_cons]20.228368display _b[female]5.486894display _b[read].56588693display _b[_cons] + _b[female]*1 + _b[read]*5255.141383

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.