Stata Learning Module
An overview of Stata syntax

 This module shows the general structure of Stata commands. We will demonstrate this using summarize as an example, although this general structure applies to most Stata commands.

Note: This code was tested in Stata 12.

Let's first use the auto data file.

use auto 

As you have seen, we can type summarize and it will give us summary statistics for all of the variables in the data file.

summarize 
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
    make |       0
   price |      74    6165.257   2949.496       3291      15906  
     mpg |      74     21.2973   5.785503         12         41  
   rep78 |      69    3.405797   .9899323          1          5  
  hdroom |      74    2.993243   .8459948        1.5          5  
   trunk |      74    13.75676   4.277404          5         23  
  weight |      74    3019.459   777.1936       1760       4840  
  length |      74    187.9324   22.26634        142        233  
    turn |      74    39.64865   4.399354         31         51  
   displ |      74    197.2973   91.83722         79        425  
  gratio |      74    3.014865   .4562871       2.19       3.89  
 foreign |      74    .2972973   .4601885          0          1   

It is also possible to obtain means for specific variables. For example, below we get summary statistics just for mpg and price.

summarize mpg price 
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     mpg |      74     21.2973   5.785503         12         41  
   price |      74    6165.257   2949.496       3291      15906   

We could further tell Stata to limit the summary statistics to just foreign cars by adding an if qualifier.

summarize mpg price if (foreign == 1) 
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     mpg |      22    24.77273   6.611187         14         41  
   price |      22    6384.682   2621.915       3748      12990   

The if qualifier can contain more than one condition. Here, we ask for summary statistics for the foreign cars which get less than 30 miles per gallon.

summarize mpg price if foreign == 1 & mpg <30
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     mpg |      17    21.94118   3.896643         14         28  
   price |      17    6996.235   2674.552       3895      12990   

We can use the detail option to ask Stata to give us more detail in the summary statistics. Notice that the detail option goes after the comma. If the comma were omitted, Stata would give an error.

summarize mpg price if foreign == 1 & mpg <30 , detail 
                              mpg
-------------------------------------------------------------
      Percentiles      Smallest
 1%           14             14
 5%           14             17
10%           17             17       Obs                  17
25%           18             18       Sum of Wgt.          17

50%           23                      Mean           21.94118
                        Largest       Std. Dev.      3.896643
75%           25             25
90%           26             25       Variance       15.18382
95%           28             26       Skewness      -.4901235
99%           28             28       Kurtosis       2.201759

                            price
-------------------------------------------------------------
      Percentiles      Smallest
 1%         3895           3895
 5%         3895           4296
10%         4296           4499       Obs                  17
25%         5079           4697       Sum of Wgt.          17

50%         6229                      Mean           6996.235
                        Largest       Std. Dev.      2674.552
75%         8129           9690
90%        11995           9735       Variance        7153229
95%        12990          11995       Skewness       .9818272
99%        12990          12990       Kurtosis       2.930843 

Note that even though we built these parts up one at a time, they don't have to go together. Let's look at some other forms of the summarize command.

You can tell Stata which observation numbers you want using the in qualifier. Here we ask for summaries of observations 1 to 10. This is useful if you have a big data file and want to try out a command on a subset of observations.

summarize in 1/10 
 Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
    make |       0
   price |      10      5517.4   2063.518       3799      10372  
     mpg |      10        19.5    3.27448         15         26  
   rep78 |       8       3.125   .3535534          3          4  
  hdroom |      10         3.3   .7527727          2        4.5  
   trunk |      10        14.7    3.88873         10         21  
  weight |      10        3271   558.3796       2230       4080  
  length |      10         194   19.32759        168        222  
    turn |      10        40.2   3.259175         34         43  
   displ |      10       223.9   71.77503        121        350  
  gratio |      10       2.907   .3225264       2.41       3.58  
 foreign |      10           0          0          0          0   

Also, recall that you can ask Stata to perform summaries for foreign and domestic cars separately using by, as shown below.

sort foreign 
by foreign: summarize 
 -> foreign= 0  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
    make |       0
   price |      52    6072.423   3097.104       3291      15906  
     mpg |      52    19.82692   4.743297         12         34  
   rep78 |      48    3.020833    .837666          1          5  
  hdroom |      52    3.153846   .9157578        1.5          5  
   trunk |      52       14.75   4.306288          7         23  
  weight |      52    3317.115   695.3637       1800       4840  
  length |      52    196.1346   20.04605        147        233  
    turn |      52    41.44231   3.967582         31         51  
   displ |      52    233.7115   85.26299         86        425  
  gratio |      52    2.806538   .3359556       2.19       3.58  
 foreign |      52           0          0          0          0  

-> foreign= 1  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
    make |       0
   price |      22    6384.682   2621.915       3748      12990  
     mpg |      22    24.77273   6.611187         14         41  
   rep78 |      21    4.285714   .7171372          3          5  
  hdroom |      22    2.613636   .4862837        1.5        3.5  
   trunk |      22    11.40909   3.216906          5         16  
  weight |      22    2315.909   433.0035       1760       3420  
  length |      22    168.5455   13.68255        142        193  
    turn |      22    35.40909   1.501082         32         38  
   displ |      22    111.2273   24.88054         79        163  
  gratio |      22    3.507273   .2969076       2.98       3.89  
 foreign |      22           1          0          1          1   

Let's review all those pieces.

A command can be preceded with a by prefix, as shown below.

by foreign: summarize

There are many parts that can come after a command.  They are each presented separately below.
For example, summarize followed by the names of variables.

summarize mpg price

summarize with in specifying a range of records to be summarized.

summarize in 1/10

 summarize with simple if specifying records to summarize.

summarize if foreign == 1

summarize with complex if specifying records to summarize.

summarize if foreign == 1 & mpg > 30

summarize followed by option(s).

summarize , detail 

So, putting it all together, the general syntax of the summarize command can be described as:

[by varlist:] summarize [varlist] [in range] [if exp] , [options] 

Understanding the overall syntax of Stata commands helps you remember them and use them more effectively, and it also aids you understand the help files in Stata. All the extra stuff about by, if and in could be confusing. Let's have a look at the help file for summarize. It makes more sense knowing what the by, if and in parts mean.

 help summarize 
-------------------------------------------------------------------------------
help for summarize                                     (manual:  [R] summarize)
-------------------------------------------------------------------------------

Summary statistics
------------------

    [by varlist:]  summarize [varlist] [weight] [if exp] [in range]
                             [, { detail | meanonly } format ] 

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.