Stata Class Notes
Entering Data


1.0 Stata commands in this unit

cdChange directory
dir or lsShow files in current directory
insheetRead ASCII (text) data created by a spreadsheet
infileRead unformatted ASCII (text) data
infixRead ASCII (text) data in fixed format
inputEnter data from keyboard
describeDescribe contents of data in memory or on disk
compressCompress data in memory
saveStore the dataset currently in memory on disk in Stata data format
use Load a Stata-format dataset
countShow the number of observations
listList values of variables
clearClear the entire dataset and everything else
memoryDisplay a report on memory usage
set memorySet the size of memory

2.0 Demonstration and explanation

We will start with inputting a spreadsheet type of data file into Stata. A spreadsheet type of file is created by programs such as Excel. For example, in Excel, we can save a file into a comma-separated-values format (.csv) file. Stata reads in this type of data using the insheet command. Let's first get to the directory where the file hs0.csv is. This data file has variable names on the first line.

Here is a partial listing from the comma-separated file:

gender,id,race,ses,schtyp,prgtype,read,write,math,science,socst
0,70,4,1,1,general,57,52,41,47,57
1,121,4,2,1,vocati,68,59,53,63,61
0,86,4,3,1,general,44,33,54,58,31
0,141,4,3,1,vocati,63,44,47,53,56
0,172,4,2,1,academic,47,52,57,53,61
0,113,4,2,1,academic,44,52,51,63,61
0,50,3,2,1,general,50,59,42,53,61
0,11,1,2,1,academic,34,46,45,39,36
0,84,4,2,1,general,63,57,54,,51
0,48,3,2,1,academic,57,55,52,50,51
And here are the Stata commands to read these data.
cd d:\stata_data
dir
insheet using hs0.csv, clear
describe

What if the data file does not have the variable names on the first line? We have a such file called hs0_noname.csv.  We will also do a count to see if the inputting was successful.

insheet gender id race ses schtyp prgtype read write math science socst using hs0_noname.csv, clear
count

To read a space-delimited file we will use infile command. The first part of the file hs0.raw is shown below.

0 70 4 1 1 general 57 52 41 47 57
1 121 4 2 1 vocati 68 59 53 63 61
0 86 4 3 1 general 44 33 54 58 31
0 141 4 3 1 vocati 63 44 47 53 56
0 172 4 2 1 academic 47 52 57 53 61
0 113 4 2 1 academic 44 52 51 63 61
0 50 3 2 1 general 50 59 42 53 61
0 11 1 2 1 academic 34 46 45 39 36
0 84 4 2 1 general 63 57 54 . 51
0 48 3 2 1 academic 57 55 52 50	51
0 75 4 2 1 vocati 60 46	51 53 61
0 60 5 2 1 academic 57 65 51 63	61

Notice how we specify a character variable below.  The variable prgtype is a character variable. We tell Stata this and that we want it to have a length of 10 by typing str10 before the variable name. We will use the hs0.raw data file.

infile gender id race ses schtyp str10 prgtype read write math science socst using hs0.raw, clear

The other type of commonly used ASCII data format is fixed format. It always requires a codebook to specify which column(s) corresponds to which variable. Here is small example of this type of data with a codebook. Notice how we make use of the codebook in the infix command below. We will use the schdat.fix data file.

        195  094951
        26386161941
        38780081841
        479700  870
        56878163690
        66487182960
        786  069  0
        88194193921
        98979090781
       107868180801
variable name column number
id 1-2
a1 3-4
t1 5-6
gender 7
a2 8-9
t2 10-11
tgender 12
clear
infix id 1-2 a1 3-4 t1 5-6 gender 7 a2 8-9 t2 10-11 tgender 12 using schdat.fix

We can also use the Do-file editor to input data. The Do-file editor is used for writing a sequence of commands and running them all at once. You can copy and paste the following Stata syntax to the Do-file editor and run it.

clear
input id female race ses str3 schtype prog read write math science socst
147 1 1 3 pub 1 47 62 53 53 61
108 0 1 2 pub 2 34 33 41 36 36
 18 0 3 2 pub 3 50 33 49 44 36
153 0 1 2 pub 3 39 31 40 39 51
 50 0 2 2 pub 2 50 59 42 53 61
 51 1 2 1 pub 2 42 36 42 31 39
102 0 1 1 pub 1 52 41 51 53 56
 57 1 1 2 pub 1 71 65 72 66 56
160 1 1 2 pub 1 55 65 55 50 61
136 0 1 2 pub 1 65 59 70 63 51
end

After running the above program, we can issue the describe command to get a general idea about the data set. The compress command reduces the size of the data set. We can save the data set to disk by issuing the save command.

describe
compress
save hsb10 

To read in a Stata data file, we use the use command.

clear
use hsb10

The use command can also be used to read a data file over the internet.

use http://www.ats.ucla.edu/stat/data/hs0, clear

Sometimes, the data file may be too big to be read in. We will have to reset the amount of memory allocated to Stata.

clear
use http://www.ats.ucla.edu/stat/data/large
memory
set memory 5m
use http://www.ats.ucla.edu/stat/data/large, clear

3.0 For more information

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.