### Stata Learning Module An overview of Stata syntax

This module shows the general structure of Stata commands. We will demonstrate this using summarize as an example, although this general structure applies to most Stata commands.

Note: This code was tested in Stata 12.

Let's first use the auto data file.

use auto

As you have seen, we can type summarize and it will give us summary statistics for all of the variables in the data file.

summarize
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
make |       0
price |      74    6165.257   2949.496       3291      15906
mpg |      74     21.2973   5.785503         12         41
rep78 |      69    3.405797   .9899323          1          5
hdroom |      74    2.993243   .8459948        1.5          5
trunk |      74    13.75676   4.277404          5         23
weight |      74    3019.459   777.1936       1760       4840
length |      74    187.9324   22.26634        142        233
turn |      74    39.64865   4.399354         31         51
displ |      74    197.2973   91.83722         79        425
gratio |      74    3.014865   .4562871       2.19       3.89
foreign |      74    .2972973   .4601885          0          1   

It is also possible to obtain means for specific variables. For example, below we get summary statistics just for mpg and price.

summarize mpg price
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
mpg |      74     21.2973   5.785503         12         41
price |      74    6165.257   2949.496       3291      15906   

We could further tell Stata to limit the summary statistics to just foreign cars by adding an if qualifier.

summarize mpg price if (foreign == 1)
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
mpg |      22    24.77273   6.611187         14         41
price |      22    6384.682   2621.915       3748      12990   

The if qualifier can contain more than one condition. Here, we ask for summary statistics for the foreign cars which get less than 30 miles per gallon.

summarize mpg price if foreign == 1 & mpg <30
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
mpg |      17    21.94118   3.896643         14         28
price |      17    6996.235   2674.552       3895      12990   

We can use the detail option to ask Stata to give us more detail in the summary statistics. Notice that the detail option goes after the comma. If the comma were omitted, Stata would give an error.

summarize mpg price if foreign == 1 & mpg <30 , detail
                              mpg
-------------------------------------------------------------
Percentiles      Smallest
1%           14             14
5%           14             17
10%           17             17       Obs                  17
25%           18             18       Sum of Wgt.          17

50%           23                      Mean           21.94118
Largest       Std. Dev.      3.896643
75%           25             25
90%           26             25       Variance       15.18382
95%           28             26       Skewness      -.4901235
99%           28             28       Kurtosis       2.201759

price
-------------------------------------------------------------
Percentiles      Smallest
1%         3895           3895
5%         3895           4296
10%         4296           4499       Obs                  17
25%         5079           4697       Sum of Wgt.          17

50%         6229                      Mean           6996.235
Largest       Std. Dev.      2674.552
75%         8129           9690
90%        11995           9735       Variance        7153229
95%        12990          11995       Skewness       .9818272
99%        12990          12990       Kurtosis       2.930843 

Note that even though we built these parts up one at a time, they don't have to go together. Let's look at some other forms of the summarize command.

You can tell Stata which observation numbers you want using the in qualifier. Here we ask for summaries of observations 1 to 10. This is useful if you have a big data file and want to try out a command on a subset of observations.

summarize in 1/10
 Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
make |       0
price |      10      5517.4   2063.518       3799      10372
mpg |      10        19.5    3.27448         15         26
rep78 |       8       3.125   .3535534          3          4
hdroom |      10         3.3   .7527727          2        4.5
trunk |      10        14.7    3.88873         10         21
weight |      10        3271   558.3796       2230       4080
length |      10         194   19.32759        168        222
turn |      10        40.2   3.259175         34         43
displ |      10       223.9   71.77503        121        350
gratio |      10       2.907   .3225264       2.41       3.58
foreign |      10           0          0          0          0   

Also, recall that you can ask Stata to perform summaries for foreign and domestic cars separately using by, as shown below.

sort foreign
by foreign: summarize
 -> foreign= 0
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
make |       0
price |      52    6072.423   3097.104       3291      15906
mpg |      52    19.82692   4.743297         12         34
rep78 |      48    3.020833    .837666          1          5
hdroom |      52    3.153846   .9157578        1.5          5
trunk |      52       14.75   4.306288          7         23
weight |      52    3317.115   695.3637       1800       4840
length |      52    196.1346   20.04605        147        233
turn |      52    41.44231   3.967582         31         51
displ |      52    233.7115   85.26299         86        425
gratio |      52    2.806538   .3359556       2.19       3.58
foreign |      52           0          0          0          0

-> foreign= 1
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
make |       0
price |      22    6384.682   2621.915       3748      12990
mpg |      22    24.77273   6.611187         14         41
rep78 |      21    4.285714   .7171372          3          5
hdroom |      22    2.613636   .4862837        1.5        3.5
trunk |      22    11.40909   3.216906          5         16
weight |      22    2315.909   433.0035       1760       3420
length |      22    168.5455   13.68255        142        193
turn |      22    35.40909   1.501082         32         38
displ |      22    111.2273   24.88054         79        163
gratio |      22    3.507273   .2969076       2.98       3.89
foreign |      22           1          0          1          1   

Let's review all those pieces.

A command can be preceded with a by prefix, as shown below.

by foreign: summarize

There are many parts that can come after a command.  They are each presented separately below.
For example, summarize followed by the names of variables.

summarize mpg price

summarize with in specifying a range of records to be summarized.

summarize in 1/10

summarize with simple if specifying records to summarize.

summarize if foreign == 1

summarize with complex if specifying records to summarize.

summarize if foreign == 1 & mpg > 30

summarize followed by option(s).

summarize , detail

So, putting it all together, the general syntax of the summarize command can be described as:

[by varlist:] summarize [varlist] [in range] [if exp] , [options]

Understanding the overall syntax of Stata commands helps you remember them and use them more effectively, and it also aids you understand the help files in Stata. All the extra stuff about by, if and in could be confusing. Let's have a look at the help file for summarize. It makes more sense knowing what the by, if and in parts mean.

 help summarize
-------------------------------------------------------------------------------
help for summarize                                     (manual:  [R] summarize)
-------------------------------------------------------------------------------

Summary statistics
------------------

[by varlist:]  summarize [varlist] [weight] [if exp] [in range]
[, { detail | meanonly } format ] 

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.