UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Class Notes
Counting from _n to _N


Introduction

Stata has two built-in variables called _n and _N. _n is Stata notation for the current observation number. _n is 1 in the first observation, 2 in the second, 3 in the third, and so on.

_N is Stata notation for the total number of observations. Let's see how _n and _N work.


input score group
72 1
84 2
76 1
89 3
82 2
90 1
85 1
end

generate id = _n
generate nt = _N
list

         score      group         id         nt 
  1.        72          1          1          7  
  2.        84          2          2          7  
  3.        76          1          3          7  
  4.        89          3          4          7  
  5.        82          2          5          7  
  6.        90          1          6          7  
  7.        85          1          7          7 
  

As you can see, the variable id contains observation number running from 1 to 7 and nt is the total number of observations, which is 7.

Counting with by

Using _n and _N in conjunction with the by command can produce some very useful results. Of course, to use the by command we must first sort our data on the by variable.

sort group score
by group: generate n1 = _n
by group: generate n2 = _N
list

         score      group         id         nt         n1         n2 
  1.        72          1          1          7          1          4  
  2.        76          1          3          7          2          4  
  3.        85          1          7          7          3          4  
  4.        90          1          6          7          4          4  
  5.        82          2          5          7          1          2  
  6.        84          2          2          7          2          2  
  7.        89          3          4          7          1          1 
 

Now n1 is the observation number within each group and n2 is the total number of observations for each group.

To list the lowest score for each group use the following:


list if n1==1

         score      group         id         nt         n1         n2 
  1.        72          1          1          7          1          4  
  5.        82          2          5          7          1          2  
  7.        89          3          4          7          1          1 

To list the highest score for each group use the following:


list if n1==n2

         score      group         id         nt         n1         n2 
  4.        90          1          6          7          4          4  
  6.        84          2          2          7          2          2  
  7.        89          3          4          7          1          1 

Another use of _n

Let's use _n to find out if there are duplicate id numbers in the following data:

input id score
117 72 
204 84 
311 76 
289 89 
141 82 
277 90 
465 85 
289 88
182 84
end

sort id
list if id == id[_n + 1]

            id      score 
  6.       289         88
  

list in 6/7

            id      score 
  6.       289         88  
  7.       289         89 

As it turns out, observations 6 and 7 have the same id numbers and but different score values.

Finding Duplicates

Now let's use _N to find duplicate observations.

input id score x1 x2 y1 y2 z1 z2
117 72 3 16 42 7 59 61
204 84 6 12 44 9 51 66
141 82 2 17 41 5 56 61
311 76 9 14 46 1 58 62
289 89 4 13 48 3 55 68
141 82 2 17 41 5 56 61
277 90 3 12 44 6 52 65
465 85 5 19 43 2 54 64
289 88 7 18 45 4 58 69
182 84 1 11 47 7 52 61
141 90 4 13 43 4 51 65
end

sort id score x1 x2 y1 y2 z1 z2
by id score x1 x2 y1 y2 z1 z2: generate n = _N
list if n>1


Observation 2

          id          141       score           82          x1            2
          x2           17          y1           41          y2            5
          z1           56          z2           61           n            2


Observation 3

          id          141       score           82          x1            2
          x2           17          y1           41          y2            5
          z1           56          z2           61           n            2

In this example we sort the observations by all of the variables. Then we use all of the variable in the by statement and set set n equal to the total number of observations that are identical. Finally, we list the observations for which _N is greater than 1, thereby identifyling the duplicate observations.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.