Stata Learning Module
Inputting your data into Stata

This module will show how to input your data into Stata. This covers inputting data with comma delimited, tab delimited, space delimited, and fixed column data. 

Note: all of the sample input files for this page were created by us and are not included with Stata.  You can create them yourself to try out this code by copying and pasting the data into a text file.

1. Typing data into the Stata editor

One of the easiest methods for getting data into Stata is using the Stata data editor, which resembles an Excel spreadsheet. It is useful when your data is on paper and needs to be typed in, or if your data is already typed into an Excel spreadsheet. To learn more about the Stata data editor, see the edit module.

2. Comma/tab separated file with variable names on line 1

Two common file formats for raw data are comma separated files and tab separated files. Such files are commonly made from spreadsheet programs like Excel. Consider the comma delimited file shown below.

type auto2.raw 
 make, mpg, weight, price
AMC Concord, 22, 2930,    4099
AMC Pacer,  17,  3350, 4749
AMC Spirit,  22,  2640, 3799
Buick Century,   20, 3250, 4816
Buick Electra,  15,4080, 7827 

This file has two characteristics:
- The first line has the names of the variables separated by commas,
- The following lines have the values for the variables, also separated by commas.

This kind of file can be read using the insheet command, as shown below.

insheet using auto2.raw 

 (4 vars, 5 obs) 

We can check to see if the data came in right using the list command.

list  

               make       mpg    weight     price 
  1.   AMC Concord        22      2930      4099  
  2.     AMC Pacer        17      3350      4749  
  3.    AMC Spirit        22      2640      3799  
  4. Buick Century        20      3250      4816  
  5. Buick Electra        15      4080      7827   

Since you will likely have more observations, you can use in to list just a subset of observations. Below, we list observations 1 through 3.

list in 1/3 
               make       mpg    weight     price 
  1.   AMC Concord        22      2930      4099  
  2.     AMC Pacer        17      3350      4749  
  3.    AMC Spirit        22      2640      3799   

Now that the file has been read into Stata, you can save it with the save command (we will skip doing that step).

The exact same insheet command could be used to read a tab delimited file. The insheet command is clever because it can figure out whether you have a comma delimited or tab delimited file, and then read it. (However, insheet could not handle a file that uses a mixture of commas and tabs as delimiters.)

Before starting the next section, let's clear out the existing data in memory.

clear 

3. Comma/tab separated file (no variable names in file)

Consider a file that is identical to the one we examined in the previous section, but it does not have the variable names on line 1

type auto3.raw 
 AMC Concord, 22, 2930, 4099
AMC Pacer,  17,  3350, 4749
AMC Spirit,  22,  2640, 3799
Buick Century,   20, 3250, 4816
Buick Electra,  15,4080, 7827 

This file can be read using the insheet command as shown below.

insheet using auto3.raw 
 (4 vars, 5 obs) 

But where did Stata get the variable names? If Stata does not have names for the variables, it names them v1, v2, v3 etc., as you can see below.

list 

                v1        v2        v3        v4 
  1.   AMC Concord        22      2930      4099  
  2.     AMC Pacer        17      3350      4749  
  3.    AMC Spirit        22      2640      3799  
  4. Buick Century        20      3250      4816  
  5. Buick Electra        15      4080      7827   

Let's clear out the data in memory, and then try reading the data again.

clear 

Now, let's try reading the data and tell Stata the names of the variables on the insheet command.

insheet make mpg weight price using auto3.raw 
 (4 vars, 5 obs) 

As the list command shows, Stata used the variable names supplied on the insheet command.

list 

              make       mpg    weight     price 
  1.   AMC Concord        22      2930      4099  
  2.     AMC Pacer        17      3350      4749  
  3.    AMC Spirit        22      2640      3799  
  4. Buick Century        20      3250      4816  
  5. Buick Electra        15      4080      7827   

The insheet command works equally well on files which use tabs as separators. Stata examines the file and determines whether commas or tabs are being used as separators and reads the file appropriately.

Now that the file has been read into Stata, you can save it with the save command (we will skip doing that step).

Let's clear out the data in memory before going to the next section.

clear 

4. Space separated file

Consider a file where the variables are separated by spaces like the one shown below.

type auto4.raw 
 "AMC Concord" 22  2930  4099
"AMC Pacer"  17   3350  4749
"AMC Spirit"  22   2640  3799
"Buick Century"   20  3250  4816
"Buick Electra"  15 4080  7827 

Note that the make of car is contained within quotation marks. This is necessary because the names contain spaces within them. Without the quotes, Stata would think AMC is the make and Concord is the mpg. If the make did not have spaces embedded within them, the quotation marks would not be needed.

This file can be read with the infile command as shown below.

infile str13 make mpg weight price using auto4.raw 
 (5 observations read) 

You may be asking yourself, where did the str13 come from? Since make is a character variable, we need to tell Stata that it is a character variable, and how long it can be. The str13 tells Stata it is a string variable and that it could be up to 13 characters wide.

The list command confirms that the data was read correctly.

list 
               make        mpg     weight      price 
  1.   AMC Concord         22       2930       4099  
  2.     AMC Pacer         17       3350       4749  
  3.    AMC Spirit         22       2640       3799  
  4. Buick Century         20       3250       4816  
  5. Buick Electra         15       4080       7827   

Now that the file has been read into Stata, you can save it with the save command (we will skip doing that step).

Let's clear out the data in memory before moving on to the next section.

clear 

5. Fixed format file

Consider a file using fixed column data like the one shown below.

type auto5.raw 
AMC Concord   22 2930 4099
AMC Pacer     17 3350 4749
AMC Spirit    22 2640 3799
Buick Century 20 3250 4816
Buick Electra 15 4080 7827 

Note that the variables are clearly defined by which column(s) they are located. Also, note that the make of car is not contained within quotation marks. The quotations are not needed because the columns define where the make begins and ends, and the embedded spaces no longer create confusion.

This file can be read with the infix command as shown below.

infix str make 1-13 mpg 15-16 weight 18-21 price 23-26 using auto5.raw 
 (5 observations read) 

Here again we need to tell Stata that make is a string variable by preceding make with str. We did not need to indicate the length since Stata can infer that make can be up to 13 characters wide based on the column locations.

The list command confirms that the data was read correctly.

list 
               make        mpg     weight      price 
  1.   AMC Concord         22       2930       4099  
  2.     AMC Pacer         17       3350       4749  
  3.    AMC Spirit         22       2640       3799  
  4. Buick Century         20       3250       4816  
  5. Buick Electra         15       4080       7827   

Now that the file has been read into Stata, you can save it with the save command (we will skip doing that step).

Let's clear out the data in memory before moving on to the next section.

clear 

6. Other methods of getting data into Stata

This does not cover all possible methods of getting raw data into Stata, but does cover many common situations. See the Stata Users Guide for more comprehensive information on reading raw data into Stata.

Another method that should be mentioned is the use of data conversion programs. These programs can convert data from one file format into another file format. For example, they could directly create a Stata file from an Excel Spreadsheet, a Lotus Spreadsheet, an Access database, a Dbase database, a SAS data file, an SPSS system file, etc. Two such examples are Stat Transfer and DBMS Copy. Both of these products are available on SSC PCs and DBMS Copy is available on Nicco and Aristotle.

Finally, if you are using Nicco, Aristotle or the RS/6000 Cluster, there is a command specifically for converting SAS data into Stata called sas2stata. If you have SAS data you want to convert to Stata, this may be a useful way to get your SAS data into Stata.

7. Summary

Bring up the Stata data editor for typing data in.

      . edit  

Read in the comma or tab delimited file called auto2.raw taking the variable names from the first line of data.

      . insheet using auto2.raw, clear  

Read in the comma or tab delimited file called auto3.raw naming the variables mpg weight and price.

      . insheet make mpg weight price  using auto3.raw, clear  

Read in the space separated file named auto4.raw. The variable make is surrounded by quotes because it has embedded blanks.

      . infile str13 make mpg weight price  using auto4.raw, clear  

Read in the fixed format file named auto5.raw.

      . infix str make 1-13 mpg 15-16 weight 18-21  using auto5.raw, clear  

Other methods
     DBMS/Copy, Stat Transfer, sas2stata, and  Stata Users Guide.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.