Stata Class Notes
Managing Data


1.0 Stata commands in this unit

pwdShow current directory (pwd=print working directory)
dir or lsShow files in current directory
cd Change directory
keep ifKeep observations if condition is met
keepKeep variables or observations
drop Drop variables or observations
appendAppend a data file to current file
sortSort observations
mergeMerge a data file with current file

2.0 Demonstration and explanation

Example 2.1 - Subsetting data

Suppose we are undergraduates working on our honors thesis and we wish to analyze just a subset of the hs1 data file.  In fact, we are studying "good readers" and just want to focus on the students who had a reading score of 60 and higher.  The following shows how we can take the hs1 data file and make a separate folder called honors and store a copy of our data which just has the students with reading scores of 60 or higher.
use hs1, clear
pwd
dir
ls
cd Stata_data
keep if read >= 60
describe
summarize read
save hsgoodread, replace
pwd

Example 2.1, continued - Keeping variables

Further suppose that our data file had many, many variables, say 2000 variables, but we only care about just a handful of them, id, female, read and write.  We can subset our data file to keep just those variables as shown below.
keep id female read write
save hskept, replace
describe
list in 1/20

Example 2.1, continued - Dropping variables

Instead of wanting to keep just a handful of variables, it is possible that we might want to get rid of just a handful of variables in our data file.  Below we show how we could get rid of the variables ses and prog
use hsgoodread, clear
drop ses prog
save hsdropped, replace
describe
list in 1/10

Example 2.2 - Appending data

Now we have moved on to our master's thesis.  We have a folder called masters and we have been given a file with the data for the males (called hsmale) and a file for the females (called hsfemale).  We need to combine these files together to be able to analyze them, as shown below.  In this example, we are adding cases, sometimes called "stacking" datasets.
dir
use hsmale
tabulate female
append using hsfemale
tabulate female
save hsmasters, replace

Example 2.3 - Merging data

Now we are working on our dissertation and, as with our masters, we have been given two files.  In this case, we have a file that has the demographic information (called hsdemo) and a file with the test scores (called hstest) and we wish to merge these files together.  First, we need to open, sort and save each data file.  Each data file must be sorted by the same variable.  Next, we use the merge command to merge the two datasets.
dir
use hsdem, clear
list
sort id
save hsdem, replace

use hstest, clear
list
sort id
save, replace

use hsdem
merge id using hstest

list

tab _merge

save hsdiss

cd ..
dir

3.0 For more information

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.