Stata Learning Module
Using and saving files in Stata

Using and saving Stata data files

The use command gets a Stata data file from disk and places it in memory so you can analyze and/or modify it. A data file must be read into memory before you can analyze it. It is kind of like when you open a Word document; you need to read a Word document into Word before you can work with it. The use command below gets the Stata data file called auto.dta from disk and places it in memory so we can analyze and/or modify it. Since Stata data files end with .dta you need only say use auto and Stata knows to read in the file called auto.dta.

sysuse auto 

The describe command tells you information about the data that is currently sitting in memory.

describe 

 Contains data from auto.dta
  obs:            74                          
 vars:            12                          17 Feb 1999 10:49
 size:         3,108 (99.6% of memory free)
-------------------------------------------------------------------------------
   1. make      str17  %17s                   
   2. price     int    %9.0g                  
   3. mpg       byte   %9.0g                  
   4. rep78     byte   %9.0g                  
   5. hdroom    float  %9.0g                  
   6. trunk     byte   %9.0g                  
   7. weight    int    %9.0g                  
   8. length    int    %9.0g                  
   9. turn      byte   %9.0g                  
  10. displ     int    %9.0g                  
  11. gratio    float  %9.0g                  
  12. foreign   byte   %9.0g                  
-------------------------------------------------------------------------------
Sorted by:   

Now that the data is in memory, we can analyze it. For example, the summarize command gives summary statistics for the data currently in memory.

summarize 

 Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
    make |       0
   price |      74    6165.257   2949.496       3291      15906  
     mpg |      74     21.2973   5.785503         12         41  
   rep78 |      69    3.405797   .9899323          1          5  
  hdroom |      74    2.993243   .8459948        1.5          5  
   trunk |      74    13.75676   4.277404          5         23  
  weight |      74    3019.459   777.1936       1760       4840  
  length |      74    187.9324   22.26634        142        233  
    turn |      74    39.64865   4.399354         31         51  
   displ |      74    197.2973   91.83722         79        425  
  gratio |      74    3.014865   .4562871       2.19       3.89  
 foreign |      74    .2972973   .4601885          0          1   

Let's make a change to the data in memory. We will compute a variable called price2 which will be double the value of price.

generate price2 = 2*price 

If we use the describe command again, we see the variable we just created is part of the data in memory. We also see a note from Stata saying dataset has changed since last saved. Stata knows that the data in memory has changed, and would need to be saved to avoid losing the changes. It is like when you are editing a Word document; if you don't save the data, any changes you make will be lost. If we shut the computer off before saving the changes, the changes we made would be lost.

describe 
 Contains data from auto.dta
  obs:            74                          
 vars:            13                          17 Feb 1999 10:49
 size:         3,404 (99.6% of memory free)
-------------------------------------------------------------------------------
   1. make      str17  %17s                   
   2. price     int    %9.0g                  
   3. mpg       byte   %9.0g                  
   4. rep78     byte   %9.0g                  
   5. hdroom    float  %9.0g                  
   6. trunk     byte   %9.0g                  
   7. weight    int    %9.0g                  
   8. length    int    %9.0g                  
   9. turn      byte   %9.0g                  
  10. displ     int    %9.0g                  
  11. gratio    float  %9.0g                  
  12. foreign   byte   %9.0g                  
  13. price2    float  %9.0g                  
-------------------------------------------------------------------------------
Sorted by:  
     Note:  dataset has changed since last saved 

The save command is used to save the data in memory permanently on disk. Let's save this data and call it auto2 (Stata will save it as auto2.dta).

save auto2 

 file auto2.dta saved 

Let's make another change to the dataset. We will compute a variable called price3 which will be three times the value of price.

generate price3 = 3*price 

Let's try to save this data again to auto2

save auto2 
file auto2.dta already exists
r(602); 

Did you see how Stata said file auto2.dta already exists? Stata is worried that you will accidentally overwrite your data file. You need to use the replace option to tell Stata that you know that the file exists and you want to replace it.

save auto2, replace 

file auto2.dta saved 

Let's make another change to the data in memory by creating a variable called price4 that is four times the price.

generate price4 = price*4 

Suppose we want to use the original auto file and we don't care if we lose the changes we just made in memory (i.e., losing the variable price4). We can try to use the auto file.

sysuse auto 

no; data in memory would be lost
r(4); 

See how Stata refused to use the file, saying no; data in memory would be lost? Stata did not want you to lose the changes that you made to the data sitting in memory. If you really want to discard the changes in memory, then use need to use the clear option on the use command, as shown below.

sysuse auto, clear 

Stata tries to protect you from losing your data by doing the following:
1. If you want to save a file over an existing file, you need to use the replace option, e.g., save auto, replace.
2. If you try to use a file and the file in memory has unsaved changes, you need to use the clear option to tell Stata that you want to discard the changes, e.g., use auto, clear.

Before we move on to the next topic, let's clear out the data in memory.

clear 

Using files larger than 1 megabyte

When you use a data file, Stata reads the entire file into memory. By default, Stata limits the size of data in memory to 1 megabyte (PC version 6.0 Intercooled). You can view the amount of memory that Stata has reserved for data with the memory command.

memory 

   Total memory                            1,048,576 bytes    100.00%

  overhead (pointers)                             0            0.00%
  data                                            0            0.00%
                                       ------------
  data + overhead                                 0            0.00%

  programs, saved results, etc.               1,152            0.11%
                                       ------------
  Total                                       1,152            0.11%

  Free                                    1,047,424           99.89% 

If you try to use a file which exceeds the amount of memory Stata has allocated for data, it will give you an error message like this.
no room to add more observations
r(901);

You can increase the amount of memory that Stata has allocated to data using the set memory command. For example, if you had a data file which was 1.5 megabytes, you can set the memory to, say, 2 megabytes shown below.

set memory 2m 

 (2048k) 

Once you have increased the memory, you should be able to use the data file if you have allocated enough memory for it.

Summary

To use the auto file from disk and read it into memory

sysuse auto 

To save the file auto from memory to disk

save auto  

To save a file if the file auto already exists

save auto, replace 

to use a file auto and clear out the current data in memory

sysuse auto, clear  

If you want to clear out the data in memory, you want to lose the changes

clear  

To allocate 2 megabytes of memory for a data file.

set memory 2m 

To view the allocation of memory to data and how much is used.

memory

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.