Stata Learning Module
Descriptive information and statistics

This module shows common commands for showing descriptive information and descriptive statistics about data files.

Getting an overview of your file

 The sysuse command loads a specified Stata-format dataset that was shipped with Stata. Here we will use the auto data file.

sysuse auto 

The describe command shows you basic information about a Stata data file. As you can see, it tells us the number of observations in the file, the number of variables, the names of the variables, and more.

describe 
 Contains data from auto.dta
  obs:            74                          
 vars:            12                          17 Feb 1999 10:49
 size:         3,108 (99.6% of memory free)
-------------------------------------------------------------------------------
   1. make      str17  %17s                   
   2. price     int    %9.0g                  
   3. mpg       byte   %9.0g                  
   4. rep78     byte   %9.0g                  
   5. hdroom    float  %9.0g                  
   6. trunk     byte   %9.0g                  
   7. weight    int    %9.0g                  
   8. length    int    %9.0g                  
   9. turn      byte   %9.0g                  
  10. displ     int    %9.0g                  
  11. gratio    float  %9.0g                  
  12. foreign   byte   %9.0g                  
-------------------------------------------------------------------------------
Sorted by:  

The codebook command is a great tool for getting a quick overview of the variables in the data file. It produces a kind of electronic codebook from the data file. Have a look at what it produces below.

codebook 
 make -------------------------------------------------------------- (unlabeled)
                  type:  string (str17)

         unique values:  74                   coded missing:  0 / 74

              examples:  "Cad. Deville"
                         "Dodge Magnum"
                         "Merc. XR-7"
                         "Pont. Catalina"

               warning:  variable has embedded blanks

price ------------------------------------------------------------- (unlabeled)
                  type:  numeric (int)

                 range:  [3291,15906]                 units:  1
         unique values:  74                   coded missing:  0 / 74

                  mean:   6165.26
              std. dev:    2949.5

           percentiles:        10%       25%       50%       75%       90%
                              3895      4195    5006.5      6342     11385

mpg --------------------------------------------------------------- (unlabeled)
                  type:  numeric (byte)

                 range:  [12,41]                      units:  1
         unique values:  21                   coded missing:  0 / 74

                  mean:   21.2973
              std. dev:    5.7855

           percentiles:        10%       25%       50%       75%       90%
                               14        18        20        25        29

rep78 ------------------------------------------------------------- (unlabeled)
                  type:  numeric (byte)

                 range:  [1,5]                        units:  1
         unique values:  5                    coded missing:  5 / 74

            tabulation:  Freq.  Value
                             2  1
                             8  2
                            30  3
                            18  4
                            11  5

hdroom ------------------------------------------------------------ (unlabeled)
                  type:  numeric (float)

                 range:  [1.5,5]                      units:  .1
         unique values:  8                    coded missing:  0 / 74

            tabulation:  Freq.  Value
                             4  1.5
                            13  2
                            14  2.5
                            13  3
                            15  3.5
                            10  4
                             4  4.5
                             1  5

trunk ------------------------------------------------------------- (unlabeled)
                  type:  numeric (byte)

                 range:  [5,23]                       units:  1
         unique values:  18                   coded missing:  0 / 74

                  mean:   13.7568
              std. dev:    4.2774

           percentiles:        10%       25%       50%       75%       90%
                                8        10        14        17        20

weight ------------------------------------------------------------ (unlabeled)
                  type:  numeric (int)

                 range:  [1760,4840]                  units:  10
         unique values:  64                   coded missing:  0 / 74

                  mean:   3019.46
              std. dev:   777.194

           percentiles:        10%       25%       50%       75%       90%
                              2020      2240      3190      3600      4060

length ------------------------------------------------------------ (unlabeled)
                  type:  numeric (int)

                 range:  [142,233]                    units:  1
         unique values:  47                   coded missing:  0 / 74

                  mean:   187.932
              std. dev:   22.2663

           percentiles:        10%       25%       50%       75%       90%
                               157       170     192.5       204       218

turn -------------------------------------------------------------- (unlabeled)
                  type:  numeric (byte)

                 range:  [31,51]                      units:  1
         unique values:  18                   coded missing:  0 / 74

                  mean:   39.6486
              std. dev:   4.39935

           percentiles:        10%       25%       50%       75%       90%
                               34        36        40        43        45

displ ------------------------------------------------------------- (unlabeled)
                  type:  numeric (int)

                 range:  [79,425]                     units:  1
         unique values:  31                   coded missing:  0 / 74

                  mean:   197.297
              std. dev:   91.8372

           percentiles:        10%       25%       50%       75%       90%
                               97       119       196       250       350

gratio ------------------------------------------------------------ (unlabeled)
                  type:  numeric (float)

                 range:  [2.19,3.89]                  units:  .01
         unique values:  36                   coded missing:  0 / 74

                  mean:   3.01486
              std. dev:   .456287

           percentiles:        10%       25%       50%       75%       90%
                              2.43      2.73     2.955      3.37      3.72

foreign ----------------------------------------------------------- (unlabeled)
                  type:  numeric (byte)

                 range:  [0,1]                        units:  1
         unique values:  2                    coded missing:  0 / 74

            tabulation:  Freq.  Value
                            52  0
                            22  1 

Another useful command for getting a quick overview of a data file is the inspect command. Here is what the inspect command produces for the auto data file.

inspect 
 make:                                          Number of Observations
-------                                                            Non-
                                               Total   Integers    Integers
|                               Negative           -         -          -
|                               Zero               -         -          -
|                               Positive           -         -          -
|                                              -----     -----      -----
|                               Total              -         -          -
|                               Missing           74
+----------------------                        -----
-8.99e+307                           74 
    (0 unique value)

price:                                         Number of Observations
--------                                                           Non-
                                               Total   Integers    Integers
|  #                            Negative           -         -          -
|  #                            Zero               -         -          -
|  #                            Positive          74        74          -
|  #                                           -----     -----      -----
|  #                            Total             74        74          -
|  #   #   .   .   .            Missing            -
+----------------------                        -----
3291              15906                           74
  (74 unique values)

mpg:                                           Number of Observations
------                                                             Non-
                                               Total   Integers    Integers
|      #                        Negative           -         -          -
|      #                        Zero               -         -          -
|      #                        Positive          74        74          -
|  #   #                                       -----     -----      -----
|  #   #   #                    Total             74        74          -
|  #   #   #   #   .            Missing            -
+----------------------                        -----
12                   41                           74
  (21 unique values)

rep78:                                         Number of Observations
--------                                                           Non-
                                               Total   Integers    Integers
|          #                    Negative           -         -          -
|          #                    Zero               -         -          -
|          #                    Positive          69        69          -
|          #   #                               -----     -----      -----
|          #   #   #            Total             69        69          -
|  .   #   #   #   #            Missing            5
+----------------------                        -----
1                     5                           74
   (5 unique values)

hdroom:                                        Number of Observations
---------                                                          Non-
                                               Total   Integers    Integers
|          #                    Negative           -         -          -
|          #                    Zero               -         -          -
|          #                    Positive          74        37         37
|  #   #   #                                   -----     -----      -----
|  #   #   #   #                Total             74        37         37
|  #   #   #   #   #            Missing            -
+----------------------                        -----
1.5                   5                           74
   (8 unique values)

trunk:                                         Number of Observations
--------                                                           Non-
                                               Total   Integers    Integers
|              #                Negative           -         -          -
|      #       #                Zero               -         -          -
|      #       #                Positive          74        74          -
|      #   #   #                               -----     -----      -----
|  #   #   #   #   #            Total             74        74          -
|  #   #   #   #   #            Missing            -
+----------------------                        -----
5                    23                           74
  (18 unique values)

weight:                                        Number of Observations
---------                                                          Non-
                                               Total   Integers    Integers
|  #       #                    Negative           -         -          -
|  #       #                    Zero               -         -          -
|  #   #   #   #                Positive          74        74          -
|  #   #   #   #                               -----     -----      -----
|  #   #   #   #                Total             74        74          -
|  #   #   #   #   #            Missing            -
+----------------------                        -----
1760               4840                           74
  (64 unique values)

length:                                        Number of Observations
---------                                                          Non-
                                               Total   Integers    Integers
|              #                Negative           -         -          -
|      #       #                Zero               -         -          -
|      #       #                Positive          74        74          -
|      #   #   #                               -----     -----      -----
|  #   #   #   #   #            Total             74        74          -
|  #   #   #   #   #            Missing            -
+----------------------                        -----
142                 233                           74
  (47 unique values)

turn:                                          Number of Observations
-------                                                            Non-
                                               Total   Integers    Integers
|      #                        Negative           -         -          -
|      #       #                Zero               -         -          -
|      #   #   #                Positive          74        74          -
|      #   #   #                               -----     -----      -----
|  #   #   #   #                Total             74        74          -
|  #   #   #   #   .            Missing            -
+----------------------                        -----
31                   51                           74
  (18 unique values)

displ:                                         Number of Observations
--------                                                           Non-
                                               Total   Integers    Integers
|  #                            Negative           -         -          -
|  #                            Zero               -         -          -
|  #                            Positive          74        74          -
|  #       #                                   -----     -----      -----
|  #   #   #   #                Total             74        74          -
|  #   #   #   #   .            Missing            -
+----------------------                        -----
79                  425                           74
  (31 unique values)

gratio:                                        Number of Observations
---------                                                          Non-
                                               Total   Integers    Integers
|          #                    Negative           -         -          -
|          #                    Zero               -         -          -
|          #                    Positive          74         -         74
|  #   #   #       #                           -----     -----      -----
|  #   #   #       #            Total             74         -         74
|  #   #   #   #   #            Missing            -
+----------------------                        -----
2.19               3.89                           74
  (36 unique values)

foreign:                                       Number of Observations
----------                                                         Non-
                                               Total   Integers    Integers
|  #                            Negative           -         -          -
|  #                            Zero              52        52          -
|  #                            Positive          22        22          -
|  #                                           -----     -----      -----
|  #   #                        Total             74        74          -
|  #   #                        Missing            -
+----------------------                        -----
0                     1                           74
   (2 unique values) 

The list command is useful for viewing all or a range of observations. Here we look at make, price, mpg, rep78 and foreign for the first 10 observations.

list make price mpg rep78 foreign in 1/10 
                   make      price        mpg      rep78    foreign 
  1.      Dodge Magnum       5886         16          2          0  
  2.        Datsun 510       5079         24          4          1  
  3.      Ford Mustang       4187         21          3          0  
  4.  Linc. Versailles      13466         14          3          0  
  5.     Plym. Sapporo       6486         26          .          0  
  6.       Plym. Arrow       4647         28          3          0  
  7.     Cad. Eldorado      14500         14          2          0  
  8.        AMC Spirit       3799         22          .          0  
  9.    Pont. Catalina       5798         18          4          0  
 10.        Chev. Nova       3955         19          3          0   

Creating tables

The tabulate command is useful for obtaining frequency tables. Below, we make a table for rep78 and a table for foreign. The command can also be shortened to tab.

tabulate rep78 
      rep78 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00 
tabulate foreign 
     foreign |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         52       70.27       70.27
          1 |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00 

The tab1 command can be used as a shortcut to request tables for a series of variables (instead of typing the tabulate command over and over again for each variable of interest).

tab1 rep78 foreign  
 -> tabulation of rep78  

      rep78 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

-> tabulation of foreign  

    foreign |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |         52       70.27       70.27
          1 |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00 

We can use the plot option to make a plot to visually show the tabulated values.

tabulate rep78, plot 
 
      rep78 |      Freq.
------------+------------+-----------------------------------------------------
          1 |          2 |**
          2 |          8 |********
          3 |         30 |******************************
          4 |         18 |******************
          5 |         11 |***********
------------+------------+-----------------------------------------------------
      Total |         69  

We can also make crosstabs using tabulate. Let's look at the repair history broken down by foreign and domestic cars.

tabulate rep78 foreign 
           |        foreign
     rep78 |         0          1 |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
         2 |         8          0 |         8 
         3 |        27          3 |        30 
         4 |         9          9 |        18 
         5 |         2          9 |        11 
-----------+----------------------+----------
     Total |        48         21 |        69  

With the column option, we can request column percentages. Notice that about 86% of the foreign cars received a rating of 4 or 5. Only about 23% of domestic cars were rated that highly.

tabulate rep78 foreign, column 
           |        foreign
     rep78 |         0          1 |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
           |      4.17       0.00 |      2.90 
-----------+----------------------+----------
         2 |         8          0 |         8 
           |     16.67       0.00 |     11.59 
-----------+----------------------+----------
         3 |        27          3 |        30 
           |     56.25      14.29 |     43.48 
-----------+----------------------+----------
         4 |         9          9 |        18 
           |     18.75      42.86 |     26.09 
-----------+----------------------+----------
         5 |         2          9 |        11 
           |      4.17      42.86 |     15.94 
-----------+----------------------+----------
     Total |        48         21 |        69 
           |    100.00     100.00 |    100.00 
 

We can use the nofreq option to suppress the frequencies, and just focus on the percentages.

tabulate rep78 foreign, column nofreq 
           |        foreign
     rep78 |         0          1 |     Total
-----------+----------------------+----------
         1 |      4.17       0.00 |      2.90 
         2 |     16.67       0.00 |     11.59 
         3 |     56.25      14.29 |     43.48 
         4 |     18.75      42.86 |     26.09 
         5 |      4.17      42.86 |     15.94 
-----------+----------------------+----------
     Total |    100.00     100.00 |    100.00 
 

Note that the order of the options does not matter. Just remember that the options must come after the comma.

tabulate rep78 foreign, nofreq column  
           |        foreign
     rep78 |         0          1 |     Total
-----------+----------------------+----------
         1 |      4.17       0.00 |      2.90 
         2 |     16.67       0.00 |     11.59 
         3 |     56.25      14.29 |     43.48 
         4 |     18.75      42.86 |     26.09 
         5 |      4.17      42.86 |     15.94 
-----------+----------------------+----------
     Total |    100.00     100.00 |    100.00 
 

Generating summary statistics with summarize

For summary statistics, we can use the summarize command. Let's generate some summary statistics on mpg.

summarize mpg 
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     mpg |      74     21.2973   5.785503         12         41   

We can use the detail option of the summarize command to get more detailed summary statistics.

summarize mpg, detail 
                              mpg
-------------------------------------------------------------
      Percentiles      Smallest
 1%           12             12
 5%           14             12
10%           14             14       Obs                  74
25%           18             14       Sum of Wgt.          74

50%           20                      Mean            21.2973
                        Largest       Std. Dev.      5.785503
75%           25             34
90%           29             35       Variance       33.47205
95%           34             35       Skewness       .9487176
99%           41             41       Kurtosis       3.975005 

To get these values separately for foreign and domestic, we could use the by foreign: prefix as shown below. Note that we first had to sort the data before using by foreign:.

sort foreign 
by foreign: summarize mpg 
 -> foreign= 0  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     mpg |      52    19.82692   4.743297         12         34  

-> foreign= 1  
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
     mpg |      22    24.77273   6.611187         14         41  
 

This is not the most efficient way to do this. Another way, which does not require the data to be sorted, is by using the summarize( ) option as part of the tabulate command.

tabulate foreign, summarize(mpg) 
            |           Summary of mpg
    foreign |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          0 |   19.826923   4.7432972          52
          1 |   24.772727   6.6111869          22
------------+------------------------------------
      Total |   21.297297   5.7855032          74 

Here is another example, showing the average price of cars for each level of repair history

tabulate rep78, summarize(price) 
            |          Summary of price
      rep78 |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          1 |      4564.5   522.55191           2
          2 |    5967.625   3579.3568           8
          3 |   6429.2333   3525.1398          30
          4 |      6071.5   1709.6083          18
          5 |        5913   2615.7628          11
------------+------------------------------------
      Total |   6146.0435   2912.4403          69 

Summary

Provide information about the current data file, including the number of variables and observations and a listing of the variables in a data file.

       describe 

Produce codebook like information for the current data file.

      codebook 

Provide a quick overview of data file.

      inspect 

List out the variables make and mpg.

     list make mpg 

Make a table of mpg.

      tabulate mpg 

Make a two way table of rep78 by foreign.

      tabulate rep78 foreign 

Produce summary statistics of mpg and price.

      summarize mpg price 

Produce summary statistics for mpg separately for foreign and domestic cars.

      sort foreign
     by foreign: summarize(mpg) 

Produce summary statistics for mpg by foreign (prior sorting not required).

      tabulate foreign, summarize(mpg)  

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.