Stata Learning Module
Working across variables using foreach

1. Introduction

This module illustrates (1) how to create and recode variables manually and (2) how to use foreach to ease the process of creating and recoding variables.

Consider the sample program below, which reads in income data for twelve months.

input famid inc1-inc12 
1 3281 3413 3114 2500 2700 3500 3114 3319 3514 1282 2434 2818
2 4042 3084 3108 3150 3800 3100 1531 2914 3819 4124 4274 4471
3 6015 6123 6113 6100 6100 6200 6186 6132 3123 4231 6039 6215
end
 
list 
The output is shown below
list famid inc1-inc12, clean

famid inc1    inc2    inc3    inc4    inc5    inc6    inc7    inc8    inc9   inc10    inc11    inc12  
1     3281    3413    3114    2500    2700    3500    3114    3319    3514    1282     2434     2818  
2     4042    3084    3108    3150    3800    3100    1531    2914    3819    4124     4274     4471  
3     6015    6123    6113    6100    6100    6200    6186    6132    3123    4231     6039     6215  

2. Computing variables (manually)

Say that we wanted to compute the amount of tax (10%) paid for each month, the simplest way to do this is to compute 12 variables (taxinc1-taxinc12) by multiplying each of the (inc1-inc12) by .10 as illustrated below.  As you see, this requires entering a command computing the tax for each month of data (for months 1 to 12) via the generate command.

generate taxinc1 = inc1 * .10 
generate taxinc2 = inc2 * .10 
generate taxinc3 = inc3 * .10 
generate taxinc4 = inc4 * .10 
generate taxinc5 = inc5 * .10  
generate taxinc6 = inc6 * .10 
generate taxinc7 = inc7 * .10 
generate taxinc8 = inc8 * .10 
generate taxinc9 = inc9 * .10 
generate taxinc10= inc10 * .10 
generate taxinc11= inc11 * .10 
generate taxinc12= inc12 * .10 

The output is shown below.

     +----------------------------------------------------------------------------------------------+
  1. | famid | inc1 | inc2 | inc3 | inc4 | inc5 | inc6 | inc7 | inc8 | inc9 | inc10 | inc11 | inc12 |
     |     1 | 3281 | 3413 | 3114 | 2500 | 2700 | 3500 | 3114 | 3319 | 3514 |  1282 |  2434 |  2818 |
     |----------------------------------------------------------------------------------------------|
     | taxinc1 | taxinc2 | taxinc3 | taxinc4 | taxinc5  | taxinc6  | taxinc7  | taxinc8  | taxinc9  |
     |   328.1 |   341.3 |   311.4 |     250 |     270  |     350  |   311.4  |   331.9  |   351.4  |
     |----------------------------------------------------------------------------------------------|
     |           taxinc10           |           taxinc11            |           taxinc12            |
     |              128.2           |              243.4            |              281.8            |
     +----------------------------------------------------------------------------------------------+

     +----------------------------------------------------------------------------------------------+
  2. | famid | inc1 | inc2 | inc3 | inc4 | inc5 | inc6 | inc7 | inc8 | inc9 | inc10 | inc11 | inc12 |
     |     2 | 4042 | 3084 | 3108 | 3150 | 3800 | 3100 | 1531 | 2914 | 3819 |  4124 |  4274 |  4471 |
     |----------------------------------------------------------------------------------------------|
     | taxinc1 | taxinc2 | taxinc3 | taxinc4 | taxinc5  | taxinc6  | taxinc7  | taxinc8  | taxinc9  |
     |   404.2 |   308.4 |   310.8 |     315 |     380  |     310  |   153.1  |   291.4  |   381.9  |
     |----------------------------------------------------------------------------------------------|
     |           taxinc10           |           taxinc11            |           taxinc12            |
     |              412.4           |              427.4            |              447.1            |
     +----------------------------------------------------------------------------------------------+

     +----------------------------------------------------------------------------------------------+
  3. | famid | inc1 | inc2 | inc3 | inc4 | inc5 | inc6 | inc7 | inc8 | inc9 | inc10 | inc11 | inc12 |
     |     3 | 6015 | 6123 | 6113 | 6100 | 6100 | 6200 | 6186 | 6132 | 3123 |  4231 |  6039 |  6215 |
     |----------------------------------------------------------------------------------------------|
     | taxinc1 | taxinc2 | taxinc3 | taxinc4 | taxinc5  | taxinc6  | taxinc7  | taxinc8  | taxinc9  |
     |   601.5 |   612.3 |   611.3 |     610 |     610  |     620  |   618.6  |   613.2  |   312.3  |
     |----------------------------------------------------------------------------------------------|
     |           taxinc10           |           taxinc11            |           taxinc12            |
     |              423.1           |              603.9            |              621.5            |
     +----------------------------------------------------------------------------------------------+ 

3. Computing variables (using the foreach command)

Another way to compute 12 variables representing the amount of tax paid (10%) for each month is to use the foreach command.  In the example below we use the foreach command to cycle through the variables inc1 to inc12 and compute the taxable income as taxinc1 - taxinc12.

foreach var of varlist inc1-inc12 {
  generate tax`var' = `var' * .10
}

The initial foreach statement tells Stata that we want to cycle through the variables inc1 to inc12 using the statements that are surrounded by the curly braces.  The first time we cycle through the statements, the value of var will be inc1  and the second time the value of var will  be inc2 and so on until the final iteration where the value of var will be inc12.  Each statement within the loop (in this case, just the one generate statement) is evaluated and executed. When we are inside the foreach loop, we can access the value of var by surrounding it with the funny quotation marks like this `var' . The ` is the quote right below the ~ on your keyborad and the ' is the quote below the " on your keyboard. The first time through the loop, `var' is replaced with inc1, so the statement

generate tax`var' = `var' * .10 

becomes

generate taxinc1 = inc1 * .10 

This is repeated for inc2 and then inc3 and so on until inc12. So, this foreach loop is the equivalent of executing the 12 generate statements manually, but much easier and less error prone.

4. Collapsing across variables (manually)

Often one needs to sum across variables (also known as collapsing across variables).  For example, let's say the quarterly income for each observation is desired.  In order to get this information, four quarterly variables incqtr1-incqtr4 need to be computed. Again, this can be achieved manually or by using the foreach command. Below is an example of how to compute 4 quarterly income variables incqtr1-incqtr4 by simply adding together the months that comprise a quarter.

generate incqtr1 = inc1 + inc2 + inc3 
generate incqtr2 = inc4 + inc5 + inc6 
generate incqtr3 = inc7 + inc8 + inc9 
generate incqtr4 = inc10+ inc11+ inc12 

list incqtr1 - incqtr4

The output is shown below.

     +---------------------------------------+
     | incqtr1   incqtr2   incqtr3   incqtr4 |
     |---------------------------------------|
  1. |    9808      8700      9947      6534 |
  2. |   10234     10050      8264     12869 |
  3. |   18251     18400     15441     16485 |
     +---------------------------------------+

5. Collapsing across variables (using the foreach command)

This same result as above can be achieved using the foreach command. The example below illustrates how to compute the quarterly income variables incqtr1-incqtr4 using the foreach command. 

foreach qtr of numlist 1/4 {
  local m3 = `qtr'*3
  local m2 = (`qtr'*3)-1
  local m1 = (`qtr'*3)-2
  generate incqtr`qtr' = inc`m1' + inc`m2' + inc`m3'
}
list incqtr1 - incqtr4

The output is shown below.

     +---------------------------------------+
     | incqtr1   incqtr2   incqtr3   incqtr4 |
     |---------------------------------------|
  1. |    9808      8700      9947      6534 |
  2. |   10234     10050      8264     12869 |
  3. |   18251     18400     15441     16485 |
     +---------------------------------------+

In this example, instead of cycling across variables, the foreach command is cycling across numbers, 1, 2, 3 then 4 which we refer to as qtr which represent the 4 quarters of variables that we wish to create.  The trick is the relationship between the quarter and the month numbers that compose the quarter and to create a kind of formula that relates the quarters to the months.  For example, quarter 1 of data corresponds to months 3, 2 and 1, so we can say that when the quarter (qtr) is 1 we want the months represented by qtr*3, (qtr*3)-1 and (qtr*3)-2, yielding 3, 2, and 1.  This is what the statements below from the foreach loop are doing.  They are relating the quarter to the months.

  local m3 = `qtr'*3
  local m2 = (`qtr'*3)-1
  local m1 = (`qtr'*3)-2

So, when qtr is 1, the value for m3 is 1*3, the value for m2 is (1*3)-1 and the value for m1 is (1*3)-2.  Then, imagine all of those values being substituted into the following statement from the foreach loop.

  generate incqtr`qtr' = inc`m1' + inc`m2' + inc`m3'

This then becomes

  generate incqtr1 = inc3 + inc2 + inc1

and for the next quarter (when qtr becomes 2) the statement would become

  generate incqtr2 = inc6 + inc5 + inc4

In this example, with only 4 quarters of data, it would probably be easier to simply write out the 4 generate statements manually, however if you had 40 quarters of data, then the foreach loop can save you considerable time, effort and mistakes.

6. Identifying patterns across variables (using the foreach command)

The foreach command can also be used to identify patterns across variables of a dataset.  Let's say, for example, that one needs to know which months had income that was less than the income of the previous month. To obtain this information, dummy indicators can be created to indicate in which months this occurred. Note that only 11 dummy indicators are needed for a 12 month period because the interest is in the change from one month to the next.  When a month has income that is less than the income of the previous month, the dummy indicators lowinc2-lowinc12 get assigned a "1".  When this is not the case, they are assigned a "0".   This program is illustrated below (note for simplicity we assume no missing data on income).

foreach curmon of numlist 2/12 {
  local lastmon = `curmon' - 1
  generate lowinc`curmon' = 1 if ( inc`curmon' <  inc`lastmon' )
  replace  lowinc`curmon' = 0 if ( inc`curmon' >= inc`lastmon' )
}

We can list out the original values of inc and lowinc and verify that this worked properly

list famid inc1-inc12, clean noobs

famid inc1    inc2    inc3    inc4    inc5    inc6    inc7    inc8    inc9   inc10    inc11    inc12  
1     3281    3413    3114    2500    2700    3500    3114    3319    3514    1282     2434     2818  
2     4042    3084    3108    3150    3800    3100    1531    2914    3819    4124     4274     4471  
3     6015    6123    6113    6100    6100    6200    6186    6132    3123    4231     6039     6215  
list famid lowinc2-lowinc12, clean noobs

famid lowinc2 lowinc3 lowinc4 lowinc5 lowinc6 lowinc7 lowinc8 lowinc9 lowinc10 lowinc11 lowinc12 
    1       0       1       1       0       0       1       0       0        1        0        0 
    2       1       0       0       0       1       1       0       0        0        0        0 
    3       0       1       1       0       0       1       1       1        0        0        0 

This time we used the foreach loop to compare the current month, represented by curmon, and the prior month, computed as `curmon'-1 creating lastmon.  So, for the first pass through the foreach loop the value for curmon is 2 and the value for lastmon is 1, so the generate and replace statements become

generate lowinc2 = 1 if ( inc2 <  inc1 )
replace  lowinc2 = 0 if ( inc2 >= inc1 )

The process is repeated until curmon is 12, and then the generate and replace statements become

generate lowinc12 = 1 if ( inc12 <  inc11 )
replace  lowinc12 = 0 if ( inc12 >= inc11 )

If you were using foreach to span a large range of values (say 1/1000) then it is more effcient to use forvalues since it is designed to quickly increment through a sequential list, for example

forvalues curmon = 2/12 {
  local lastmon = `curmon' - 1
  generate lowinc`curmon' = 1 if ( inc`curmon' <  inc`lastmon' )
  replace  lowinc`curmon' = 0 if ( inc`curmon' >= inc`lastmon' )
}

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.