|
|
|
||||
|
|
|||||
_N is Stata notation for the total number of observations. Let's see how _n and _N work.
input score group
72 1
84 2
76 1
89 3
82 2
90 1
85 1
end
generate id = _n
generate nt = _N
list
score group id nt
1. 72 1 1 7
2. 84 2 2 7
3. 76 1 3 7
4. 89 3 4 7
5. 82 2 5 7
6. 90 1 6 7
7. 85 1 7 7
As you can see, the variable id contains observation number running from 1 to 7 and nt is the total number of observations, which is 7.
sort group score
by group: generate n1 = _n
by group: generate n2 = _N
list
score group id nt n1 n2
1. 72 1 1 7 1 4
2. 76 1 3 7 2 4
3. 85 1 7 7 3 4
4. 90 1 6 7 4 4
5. 82 2 5 7 1 2
6. 84 2 2 7 2 2
7. 89 3 4 7 1 1
Now n1 is the observation number within each group and n2 is the total number of observations for each group.
To list the lowest score for each group use the following:
list if n1==1
score group id nt n1 n2
1. 72 1 1 7 1 4
5. 82 2 5 7 1 2
7. 89 3 4 7 1 1
To list the highest score for each group use the following:
list if n1==n2
score group id nt n1 n2
4. 90 1 6 7 4 4
6. 84 2 2 7 2 2
7. 89 3 4 7 1 1
input id score
117 72
204 84
311 76
289 89
141 82
277 90
465 85
289 88
182 84
end
sort id
list if id == id[_n + 1]
id score
6. 289 88
list in 6/7
id score
6. 289 88
7. 289 89
As it turns out, observations 6 and 7 have the same id numbers and but different score values.
input id score x1 x2 y1 y2 z1 z2
117 72 3 16 42 7 59 61
204 84 6 12 44 9 51 66
141 82 2 17 41 5 56 61
311 76 9 14 46 1 58 62
289 89 4 13 48 3 55 68
141 82 2 17 41 5 56 61
277 90 3 12 44 6 52 65
465 85 5 19 43 2 54 64
289 88 7 18 45 4 58 69
182 84 1 11 47 7 52 61
141 90 4 13 43 4 51 65
end
sort id score x1 x2 y1 y2 z1 z2
by id score x1 x2 y1 y2 z1 z2: generate n = _N
list if n>1
Observation 2
id 141 score 82 x1 2
x2 17 y1 41 y2 5
z1 56 z2 61 n 2
Observation 3
id 141 score 82 x1 2
x2 17 y1 41 y2 5
z1 56 z2 61 n 2In this example we sort the observations by all of the variables. Then we use all of the variable in the by statement and set set n equal to the total number of observations that are identical. Finally, we list the observations for which _N is greater than 1, thereby identifyling the duplicate observations.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services