UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata FAQ
How can I "fill down"/expand observations with respect to dates and then update my dates?

The question is for each level of a particular status within a unique case I have a date that I need to expand the observations between each adjacent level of status to reflect the difference in time that elapsed and then update the time variable. To tackle this problem, we'd want expand the observations based on the difference in dates with respect to adjacent status levels within each case. For this example we'll create a fake data set from a car dealership where for each customer (cust) there is a unique case identifier (case). For each case we know what the status of the car is (status)--dropped off, fixed or picked up--and what day the respective task was completed on. First we will enter the data.
input str5 cust case str11 day str20 status 
"brad"  1   "23 Oct 2003"  "dropped off" 
"brad"  1   "25 Oct 2003"  "fixed" 
"brad"  1   "27 Oct 2003"  "picked up" 
"nikki" 2   "23 Oct 2003"  "dropped off" 
"nikki" 2   "23 Oct 2003"  "fixed"
"nikki" 2   "28 Oct 2003"  "picked up" 
"chloe" 3   "30 Oct 2003"  "dropped off" 
"chloe" 3   "05 Nov 2003"  "fixed" 
"chloe" 3   "06 Nov 2003"  "picked up" 
"brad"  4   "23 Nov 2003"  "dropped off" 
"brad"  4   "24 Nov 2003"  "fixed" 
"brad"  4   "29 Nov 2003"  "picked up" 
end 

Next we want to change the string variable day into a numeric time variable. The first line of syntax below will be different depending on which version of Stata you are using. In Stata 9 and earlier the date function is as you see it below date(day,"dmy"). However, in Stata 10, the date function requires that the "dmy" be upper case. So if you are using Stata 10, the correct form is date(day, "DMY") .

gen date = date(day, "dmy")
format date %d
drop day
Before we start, we need to make two assumption about the layout of the data. The first is that the levels of status have the same sequential ordering ("fixed" follows "dropped off," and "picked up" follows "fixed" ) and the second is that there needs to be a unique identifier for each case, we don't care if we have repeat customers, we just need something to differentiate between them.

What first needs to be done is to first figure out how many days are between each level of status. By each unique case, we need to take the difference in dates for each adjacent pair of the level of status. We start off by generating an elapsed time variable, etime where we want the difference in days between the first two adjacent levels (date[2]-date[1]), in addition we only want to update our new variable if this is referring to the first adjacent pair (if status == "dropped off" ). We do the same process for the next adjacent pair, but changing the levels of the date and what level of status we are fixing. We added a constant of one to the differences of adjacent dates, this may not be necessarily done in all situations, we just did it in ours. It was nice in this example because our level of status can overlap.
sort case
by case: gen     etime = date[2]-date[1] + 1 if status == "dropped off"
by case: replace etime = date[3]-date[2] + 1 if status == "fixed"
list, sep(12)

     +------------------------------------------------+
     |  cust   case        status        date   etime |
     |------------------------------------------------|
  1. |  brad      1   dropped off   23oct2003       3 |
  2. |  brad      1         fixed   25oct2003       3 |
  3. |  brad      1     picked up   27oct2003       . |
  4. | nikki      2   dropped off   23oct2003       1 |
  5. | nikki      2         fixed   23oct2003       6 |
  6. | nikki      2     picked up   28oct2003       . |
  7. | chloe      3   dropped off   30oct2003       7 |
  8. | chloe      3         fixed   05nov2003       2 |
  9. | chloe      3     picked up   06nov2003       . |
 10. |  brad      4   dropped off   23nov2003       2 |
 11. |  brad      4         fixed   24nov2003       6 |
 12. |  brad      4     picked up   29nov2003       . |
     +------------------------------------------------+
Now that we know how many days are between the adjacent levels of status for each case, we can use the expand option to expand our elapsed time variable, etime. We sorted the data by case and status, and listed our first twelve observations just to check that our program has worked to this point.
expand etime 
sort case status
list in 1/12, sep(12)

     +------------------------------------------------+
     |  cust   case        status        date   etime |
     |------------------------------------------------|
  1. |  brad      1   dropped off   23oct2003       3 |
  2. |  brad      1   dropped off   23oct2003       3 |
  3. |  brad      1   dropped off   23oct2003       3 |
  4. |  brad      1         fixed   25oct2003       3 |
  5. |  brad      1         fixed   25oct2003       3 |
  6. |  brad      1         fixed   25oct2003       3 |
  7. |  brad      1     picked up   27oct2003       . |
  8. | nikki      2   dropped off   23oct2003       1 |
  9. | nikki      2         fixed   23oct2003       6 |
 10. | nikki      2         fixed   23oct2003       6 |
 11. | nikki      2         fixed   23oct2003       6 |
 12. | nikki      2         fixed   23oct2003       6 |
     +------------------------------------------------+
Our next and final step is to update the dates to reflect the elapsed time. To do this, we need a counter of days elapsed not only for each case, but also for the level of status, this is done by specifying case and status within the by statement. Once we have our new day counter, t, we can replace our date variable by adding date and t together.
sort case status
by case status: gen t = _n-1
replace date = date + t
list 

     +----------------------------------------------------+
     |  cust   case        status        date   etime   t |
     |----------------------------------------------------|
  1. |  brad      1   dropped off   23oct2003       3   0 |
  2. |  brad      1   dropped off   24oct2003       3   1 |
  3. |  brad      1   dropped off   25oct2003       3   2 |
  4. |  brad      1         fixed   25oct2003       3   0 |
  5. |  brad      1         fixed   26oct2003       3   1 |
     |----------------------------------------------------|
  6. |  brad      1         fixed   27oct2003       3   2 |
  7. |  brad      1     picked up   27oct2003       .   0 |
  8. | nikki      2   dropped off   23oct2003       1   0 |
  9. | nikki      2         fixed   23oct2003       6   0 |
 10. | nikki      2         fixed   24oct2003       6   1 |
     |----------------------------------------------------|
 11. | nikki      2         fixed   25oct2003       6   2 |
 12. | nikki      2         fixed   26oct2003       6   3 |
 13. | nikki      2         fixed   27oct2003       6   4 |
 14. | nikki      2         fixed   28oct2003       6   5 |
 15. | nikki      2     picked up   28oct2003       .   0 |
     |----------------------------------------------------|
 16. | chloe      3   dropped off   30oct2003       7   0 |
 17. | chloe      3   dropped off   31oct2003       7   1 |
 18. | chloe      3   dropped off   01nov2003       7   2 |
 19. | chloe      3   dropped off   02nov2003       7   3 |
 20. | chloe      3   dropped off   03nov2003       7   4 |
     |----------------------------------------------------|
 21. | chloe      3   dropped off   04nov2003       7   5 |
 22. | chloe      3   dropped off   05nov2003       7   6 |
 23. | chloe      3         fixed   05nov2003       2   0 |
 24. | chloe      3         fixed   06nov2003       2   1 |
 25. | chloe      3     picked up   06nov2003       .   0 |
     |----------------------------------------------------|
 26. |  brad      4   dropped off   23nov2003       2   0 |
 27. |  brad      4   dropped off   24nov2003       2   1 |
 28. |  brad      4         fixed   24nov2003       6   0 |
 29. |  brad      4         fixed   25nov2003       6   1 |
 30. |  brad      4         fixed   26nov2003       6   2 |
     |----------------------------------------------------|
 31. |  brad      4         fixed   27nov2003       6   3 |
 32. |  brad      4         fixed   28nov2003       6   4 |
 33. |  brad      4         fixed   29nov2003       6   5 |
 34. |  brad      4     picked up   29nov2003       .   0 |
     +----------------------------------------------------+

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California