|
|
|
||||
|
|
|||||
use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear
summarize write
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
write | 200 52.775 9.478586 31 67
We can use egen with the cut() function to make a variable called
writecat that groups the variable
write into the
following 4 categories.The table command below is used to verify that the data are grouped as we expected. We can see that when writecat is in the lowest category (30) that write ranges from 31 to 39, and so forth as we expect, e.g., the values when writecat is in category 30 correspond to write having values of 30 up to (but not including) 40.egen writecat = cut(write), at(30,40,50,60,70)
table writecat, contents(min write max write)
----------------------------------
writecat | min(write) max(write)
----------+-----------------------
30 | 31 39
40 | 40 49
50 | 50 59
60 | 60 67
----------------------------------
Here we use the same command but
our last category is 50 up to 60. As you see, it generates a missing
value because there are a number of values that are 60 or higher and thus outside of the range we specified. This
shows that if there are values outside of the range you provide, those will be
assigned a missing value.If we use the icodes option, cut() will create integer codes 0, 1, 2 and so forth. In the example below, you can see that it created codes 0 1 2 and 3.egen writecat2 = cut(write), at(30,40,50,60) (53 missing value generated)
egen writecat3 = cut(write), at(30,40,50,60,70) icodes
table writecat3, contents(min write max write)
----------------------------------
writecat3 | min(write) max(write)
----------+-----------------------
0 | 31 39
1 | 40 49
2 | 50 59
3 | 60 67
----------------------------------
If you use label option (which
automatically implies icode) then it will create integer values like
above, but it will also create value labels. As you see below, the
variable read4 is labeled 30- 40- 50- and 60-.
egen writecat4 = cut(write), at(30,40,50,60,70) label
table writecat4, contents(min write max write)
----------------------------------
writecat4 | min(write) max(write)
----------+-----------------------
30- | 31 39
40- | 40 49
50- | 50 59
60- | 60 67
----------------------------------
We use the nolabel option to suppress
the display of the value labels and you can see that the variable really is
coded 0 1 2 and 3.
tabulate writecat4, nolabel
writecat4 | Freq. Percent Cum.
------------+-----------------------------------
0 | 21 10.50 10.50
1 | 51 25.50 36.00
2 | 75 37.50 73.50
3 | 53 26.50 100.00
------------+-----------------------------------
Total | 200 100.00
If you prefer, you can ask cut()
to choose the cutoffs to form groups with approximately the same number per
group. Below we request the creation of 4 (roughly) equally sized groups.
egen writecat5 = cut(write), group(4) label
table write writecat5
--------------------------------------
writing | writecat5
score | 31- 45.5- 54- 60-
----------+---------------------------
31 | 4
33 | 4
35 | 2
36 | 2
37 | 3
38 | 1
39 | 5
40 | 3
41 | 10
42 | 2
43 | 1
44 | 12
45 | 1
46 | 9
47 | 2
49 | 11
50 | 2
52 | 15
53 | 1
54 | 17
55 | 3
57 | 12
59 | 25
60 | 4
61 | 4
62 | 18
63 | 4
65 | 16
67 | 7
--------------------------------------
For more information, see the help or reference manual about
egen.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services