| codebook | Show codebook information for file |
| label data | Apply a label to a data set |
| order | Order the variables in a data set |
| label variable | Apply a label to a variable |
| label define | Define a set of a labels for the levels of a categorical variable |
| label values | Apply value labels to a variable |
| encode | Create numeric version of a string variable |
| list | Lists the observations |
| rename | Rename a variable |
| recode | Recode the values of a variable |
| notes | Apply notes to the data file |
| generate | Creates a new variable |
| replace | Replaces one value with another value |
| egen | Extended generate - has special functions that can be used when creating a new variable |
use http://www.ats.ucla.edu/stat/data/hs0, clear
Let's use the codebook command to see what our variables look like. Because we have not listed any variables after the command, Stata will show us the codebook for all of the variables.
codebook
First, let's order the variables in a way that makes sense. While there are several possible orderings that are logical, we will put the id variable first, followed by the demographic variables, such as gender, ses and prgtype. We will put the variables regarding the test scores at the end.
order id gender
Now let's include some variable and value labels so that we know a little more about the variables.
label variable schtyp "type of school" label define scl 1 public 2 private label values schtyp scl codebook schtyp list schtyp in 1/10 list schtyp in 1/10, nolabel
Now let's create a new numeric version of the string variable prgtype. We will call our new variable prog.
encode prgtype, gen(prog) label variable prog "type of program" codebook prog list prog in 1/10 list prog in 1/10, nolabel
The variable gender may give us trouble in the future because it is difficult to know what the 1s and 2s mean.
rename gender female recode female (1=0)(2=1) label define fm 1 female 0 male label values female fm codebook female list female in 1/10 list female in 1/10, nolabel
Let's recode the value 5 in the variable race to be missing.
list race if race == 5 recode race 5 = . list race if race == .
Now let's create a variable that is a total of some of the test scores.
generate total = read + write + math + science summarize total
Note that there are five missing values of total because there are five missing values of science.
Now let's see if we can assign some letter grades to these test scores.
recode total (0/140=0 F) (141/180=1 D) (181/210=2 C) (211/234=3 B) (235/300=4 A), gen(grade) label variable grade "combined grades of read, write, math, science" codebook grade list read write math science total grade in 1/10 list read write math science total grade in 1/10, nolabel
Let's label the dataset itself so that we will remember what the data are. We can also add some notes to the data set.
label data "High School and Beyond"
notes female: the variable gender was renamed to female notes race: values of race coded as 5 were recoded to be missing notes
Stata has another way of generating new variables called egen which stands for extended generation. The egen command is a useful tool for many of specialized situations.
In our first example, we will use egen to create standard scores for the variable read.
egen zread = std(read) summarize zread list read zread in 1/10
Next we will a variable that has the mean of read for each level of ses.
egen readmean = mean(read), by(ses) list read ses readmean in 1/10
Now we will compute the average of several variables for each observation. Please note that there will be a mean for observation 9 even though it has a missing value for science.
egen row_mean = rowmean(read write math science) list read write math science row_mean in 1/10
These are just a few of the many useful egen functions built-in to Stata.
Finally, we will save our data and continue on to the next unit.
save hs1
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.