Stata FAQ How can I "fill down"/expand observations with respect to a time variable?

A time series data set may have gaps and sometimes we may want to fill in the gaps so the time variable will be in consecutive order. This involves two steps. First of all, we need to expand the data set so the time variable is in the right form. When we expand the data, we will inevitably create missing values for other variables. The second step is to replace the missing values sensibly. The examples shown here use Stata's command tsfill and a user-written command "carryforward" by David Kantor to perform the two steps described above. You can download the "carryforward" via "findit carryforward" in Stata (see How can I use the findit command to search for programs and get additional help? for more information about using findit) and following the appropriate link. What the command carryforward does is to carry values forward from one observation to the next, filling in missing values with the previous value.

Example 1, a simple fill with carryforward

In this example, the starting and end point could be different for different individuals and the gaps are filled in by individuals.

clear
input id  time  y
1      1   1.2
1      3   2.4
1      4   3.4
1      7   3.2
1      9   2.4
2      3   1.8
2      4   5.6
2      6   4.3
3      2   2.3
3      4   4.5
3      7   6.7
end

tsset id time
panel variable:  id (unbalanced)
time variable:  time, 1 to 9, but with gaps
delta:  1 unit

tsfill
list, clean noobs

id   time     y
1      1   1.2
1      2     .
1      3   2.4
1      4   3.4
1      5     .
1      6     .
1      7   3.2
1      8     .
1      9   2.4
2      3   1.8
2      4   5.6
2      5     .
2      6   4.3
3      2   2.3
3      3     .
3      4   4.5
3      5     .
3      6     .
3      7   6.7

bysort id: carryforward y, gen(yn)

list, clean noobs

id   time     y    yn
1      1   1.2   1.2
1      2     .   1.2
1      3   2.4   2.4
1      4   3.4   3.4
1      5     .   3.4
1      6     .   3.4
1      7   3.2   3.2
1      8     .   3.2
1      9   2.4   2.4
2      3   1.8   1.8
2      4   5.6   5.6
2      5     .   5.6
2      6   4.3   4.3
3      2   2.3   2.3
3      3     .   2.3
3      4   4.5   4.5
3      5     .   4.5
3      6     .   4.5
3      7   6.7   6.7  

Example 2

Sometimes, we might want to get a completely balanced data. In this case, the starting point will be the same for all the individuals and the end point will be the same for all the individuals as well. To this end, the option "full" for tsfill is used.

clear
input id  time  y
1      1   1.2
1      3   2.4
1      4   3.4
1      7   3.2
1      9   2.4
2      3   1.8
2      4   5.6
2      6   4.3
3      2   2.3
3      4   4.5
3      7   6.7
end

tsset id time
panel variable:  id (unbalanced)
time variable:  time, 1 to 9, but with gaps
delta:  1 unit

tsfill, full

list, clean noobs

id   time     y
1      1   1.2
1      2     .
1      3   2.4
1      4   3.4
1      5     .
1      6     .
1      7   3.2
1      8     .
1      9   2.4
2      1     .
2      2     .
2      3   1.8
2      4   5.6
2      5     .
2      6   4.3
2      7     .
2      8     .
2      9     .
3      1     .
3      2   2.3
3      3     .
3      4   4.5
3      5     .
3      6     .
3      7   6.7
3      8     .
3      9     .

bysort id: carryforward y, gen(yn)

list, clean noobs

id   time     y    yn
1      1   1.2   1.2
1      2     .   1.2
1      3   2.4   2.4
1      4   3.4   3.4
1      5     .   3.4
1      6     .   3.4
1      7   3.2   3.2
1      8     .   3.2
1      9   2.4   2.4
2      1     .     .
2      2     .     .
2      3   1.8   1.8
2      4   5.6   5.6
2      5     .   5.6
2      6   4.3   4.3
2      7     .   4.3
2      8     .   4.3
2      9     .   4.3
3      1     .     .
3      2   2.3   2.3
3      3     .   2.3
3      4   4.5   4.5
3      5     .   4.5
3      6     .   4.5
3      7   6.7   6.7
3      8     .   6.7
3      9     .   6.7  

Example 3

In previous example, we see that not all the missing values are replaced since "carryforward" does not carry backforward. We can replace the missing values by performing one more "carryforward" in a backward way. Although this is possible to do, it does not mean that it is a good idea to do. This example is merely for the purpose of illustration.

clear
input id  time  y
1      1   1.2
1      3   2.4
1      4   3.4
1      7   3.2
1      9   2.4
2      3   1.8
2      4   5.6
2      6   4.3
3      2   2.3
3      4   4.5
3      7   6.7
end

tsset id time
tsfill, full
bysort id: carryforward y, gen(yn)

gsort id - time
bysort id: carryforward yn, gen(yfinal)
list, clean noobs

id   time     y    yn   yfinal
1      9   2.4   2.4      2.4
1      8     .   3.2      3.2
1      7   3.2   3.2      3.2
1      6     .   3.4      3.4
1      5     .   3.4      3.4
1      4   3.4   3.4      3.4
1      3   2.4   2.4      2.4
1      2     .   1.2      1.2
1      1   1.2   1.2      1.2
2      9     .   4.3      4.3
2      8     .   4.3      4.3
2      7     .   4.3      4.3
2      6   4.3   4.3      4.3
2      5     .   5.6      5.6
2      4   5.6   5.6      5.6
2      3   1.8   1.8      1.8
2      2     .     .      1.8
2      1     .     .      1.8
3      9     .   6.7      6.7
3      8     .   6.7      6.7
3      7   6.7   6.7      6.7
3      6     .   4.5      4.5
3      5     .   4.5      4.5
3      4   4.5   4.5      4.5
3      3     .   2.3      2.3
3      2   2.3   2.3      2.3
3      1     .     .      2.3  

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.