UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

NOTE: This page has been delinked.  It is no longer being maintained, and information on this page may be out of date.

Using Dates in Stata in the Year 2000 and Beyond
(Handling the Y2K Problem in Stata)

Stata can handle dates from Jan 1, 100 to December 31, 9999 (although one should be cautious in dealing with dates before October 15, 1582 when the Gregorian calendar was put into effect).  However, as these examples will illustrate, your data and Stata programs may need some mending to be prepared for years 2000 and beyond.  This page contains example programs focusing on two main problems, data files which use 2 digits to specify the year (e.g., 12/25/98) and displaying dates using only 2 digits for the year (e.g., December 25, 98).


The Example Data File

Imagine that it is the summer of 2001 and you would like to create a very simple Stata data file containing the names and birthdays of your friends.  The names and birthdates of your friends are...

Noel was born December 25, 1903 (Christmas Day)
Hank was born February 29, 1956 (A leap year)
Mary was born December 31, 1999 (New Year's Eve, before the year 2000)
Eric was born  January  1, 2000 (Near Year's Day, year 2000)
Jane was born      July 4, 2001 (Born on the 4th of July)

Ok, so it is a short list, and a couple of your friends (Mary, Eric and Jane) are a little bit young, but this list will help demonstrate problems which can arise when dealing with dates in the Year 2000 and beyond.  We start with an example where everything is fine, where the data uses 4 digits to indicate the year of birth, and the date of birth is displayed using 4 digits for the year.  If your data files and programs are like this example, your data and Stata programs may be fully ready for the Year 2000.


Example 1. Everything is Fine.

Data Dictionary, friendsa.dct

infile dictionary {
  str4  name
  str10 bday
}
Noel 12/25/1903
Hank 02/29/1956
Mary 12/31/1999
Eric 01/01/2000
Jane 07/04/2001

Stata Program

infile using friendsa.dct
gen bdate = date(bday,"mdy")
format bdate %dM-D-CY
list

Output from Stata Program

          name        bday              bdate
  1.      Noel  12/25/1903   December-25-1903
  2.      Hank  02/29/1956   February-29-1956
  3.      Mary  12/31/1999   December-31-1999
  4.      Eric  01/01/2000    January-01-2000
  5.      Jane  07/04/2001       July-04-2001


In Example 1 above, Stata is used to read the names and birthdates of your friends, and then the list command is used to display their names and birthdates.  This example has two nice features.

  1. The birthdays are indicated using 4 digit years (e.g., December-25-1903, NOT December-25-03)
  2. LIST uses 4 digits to display the year of birth.  The %dM-D-CY  format tells Stata to display the date including the century digits for the year (e.g., 1903 instead of just 03).  It is the C in this format which tells Stata to include the Century for the Year portion of the date.

By using 4 digit years to store the birthdays, and by displaying the birthdates using 4 digit years, this program is ready for the Year 2000.  In fact, this program is ready for the year 2100, 2200, all the way up to the year 9999. However, if a different Stata format is used for displaying the birth dates, you may not be sure when some of your friends were born, as shown in Example 2 below.


Example 2.  Displaying Dates Using 2 Digit Years.

Data Dictionary, friendsa.dct

infile dictionary {
  str4  name
  str10 bday
}
Noel 12/25/1903
Hank 02/29/1956
Mary 12/31/1999
Eric 01/01/2000
Jane 07/04/2001

Stata Program

infile using friendsa.dct
gen bdate = date(bday,"mdy")
format bdate %dM-D-Y
list

Output from Stata Program.

          name        bday            bdate
  1.      Noel  12/25/1903   December-25-03
  2.      Hank  02/29/1956   February-29-56
  3.      Mary  12/31/1999   December-31-99
  4.      Eric  01/01/2000    January-01-00
  5.      Jane  07/04/2001       July-04-01


Example 2 is just like Example 1, except that the %dM-D-Y format is used to display the birthdays.  As you can see, this format only displays the last 2 digits of the year of birth, leaving you to wonder in what century some of your friends were born.  When dealing with dates which are in the year 2000 and beyond, it is important to choose a display format which will display dates using 4 digits for the year (e.g., %M-D-CY). However, there is a greater problem if the data only includes 2 digits for the year of birth, as shown in Example 3 below.


Example 3.  Inputting Dates Using 2 Digit Years.

Data Dictionary, friendsb.dct

infile dictionary {
  str4 name
  str8 bday
}
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00
Jane 07/04/01

Stata Program

infile using friendsb.dct
gen bdate = date(bday,"md19y")
format bdate %dM-D-CY
list

Output from Stata Program

          name       bday              bdate
  1.      Noel   12/25/03   December-25-1903
  2.      Hank   02/29/56   February-29-1956
  3.      Mary   12/31/99   December-31-1999
  4.      Eric   01/01/00    January-01-1900
  5.      Jane   07/04/01       July-04-1901


Example 3 demonstrates a problem of using inputting dates using only a 2 digit year.  For example, Eric was born on Jan 1, 2000 but his birthday is input as 01/01/00.  With a 2 digit date, the date(bday,"md19y")ASSUMES that the century portion is 19.  (Note that the date(bday,"md20y") expression would assume the century portion is 20.)  As you can see in the output, Eric is incorrectly assigned a birthday of  Jan 1, 1900.  In this simple example we could enter the data all over again using 4 digits for the year of birth.  However, you may have data files with thousands or millions of records using dates with 2 digit years.  Example 4, shown below, illustrates a possible solution to this problem by telling Stata when to treat a birthday as coming from the 1900s and when to treat a birthday as coming from the 2000s.


Example 4.  Inputting Dates Using 2 Digit Years With a replace .. if Statement.

Data Dictionary, friendsb.dct

infile dictionary {
  str4 name
  str8 bday
}
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00
Jane 07/04/01

Stata Program

infile using friendsb.dct
gen bdate = date(bday,"md19y")

gen bdate_y = year(bdate)
replace bdate = date(bday,"md20y") if bdate_y <= 1902
format bdate %dM-D-CY

list name bdate

Output from Stata Program

          name              bdate
  1.      Noel   December-25-1903
  2.      Hank   February-29-1956
  3.      Mary   December-31-1999
  4.      Eric    January-01-2000
  5.      Jane       July-04-2001


Example 4 demonstrates using a replace ... if statement to deal with dates which have 2 digit years.  The replace ... if statement instructs Stata replace the birthdate with one where the century portion of the date is 20 IF the person was born in 1902 or earlier (otherwise, no change is made to the birthdate).  This strategy attempts to draw a line at a certain year (in this case 1902).  Dates over that year are treated as being from the 1900s (e.g., 1903 to 1999 is treated as 1903-1999) but years 1902 and less (1900-1902) are treated as coming from the 2000s (2000-2002).  As you can see in this output, this seems to have mended our problem with the birthdays using 2 digit years.  Eric and Jane are now properly understood to have a birthday in the years 2000 and 2001 respectively.  However, Example 5 below shows a major weakness in this strategy, when a 2 digit year could mean 19xx or 20xx.


Example 5.  Problems Inputting Dates Using 2 Digit Years With a replace ... if Statement.

Data Dictionary, friendsc.dct

infile dictionary {
  str4 name
  str8 bday
}
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00
Jane 07/04/01
Will 10/31/03

Stata Program

infile using friendsc.dct

gen bdate = date(bday,"md19y")
gen bdate_y = year(bdate)
replace bdate = date(bday,"md20y") if bdate_y <= 1902
format bdate %dM-D-CY

list name bdate

Output from Stata Program

          name              bdate
  1.      Noel   December-25-1903
  2.      Hank   February-29-1956
  3.      Mary   December-31-1999
  4.      Eric    January-01-2000
  5.      Jane       July-04-2001
  6.      Will    October-31-1903


Example 5 demonstrates the major weakness of using this replace ... if strategy for solving the problems with 2 digit years. It is now Winter 2003 and you have a new friend, Will born on October 31,  2003 (Halloween  2003).  As you can see, Noel, born in 1903 and Will, born in 2003 both have birth dates of 03.  The IF statement cannot differentiate between Noel and Will, and in this case both are treated as being born in the 1900s.

Using this replace .... if  strategy is only useful when you can clearly specify a cutoff year which divides years which should be treated as 19xx from years which should be treated as 20xx.  However, when this line becomes blurred, this solution fails.  You can permanently solve your problem by revising your data file to use 4 digit years (e.g. as shown in Example 1), but this could be very costly and time consuming, requiring you to entirely restructure your data files and shift column locations for all other variables.  Example 6 shows a compromise solution by using a new variable to indicate the century portion of the date.


Example 6.  Inputting Dates Using 2 Digit Years Using an Additional Century Variable.

Data Dictionary, friendsd.dct

infile dictionary {
  str4  name
  str10 bday
  int   bday_yy
}
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00 20
Jane 07/04/01 20
Will 10/31/03 20

Stata Program

infile using friendsd.dct

gen     bdate   = date(bday,"md19y")
gen     bdate_y = year(bdate)
replace bdate   = date(bday,"md20y") if (bday_yy == 20)
format bdate %dM-D-CY

list name bdate

Output from Stata Program

          name              bdate
  1.      Noel   December-25-1903
  2.      Hank   February-29-1956
  3.      Mary   December-31-1999
  4.      Eric    January-01-2000
  5.      Jane       July-04-2001
  6.      Will    October-31-2003


Example 6 solves the problem of the dates with 2 digit years by creating a separate variable indicating the century portion of the date.  As you can see in the output, everyone is correctly assigned the proper birthdate because the data EXPLICITLY indicates which birthdates should have a 20 prefixed to the year (using the bday_yy variable).  Here are some important points about this program.

  1. This strategy does not require that you change your existing data (while dates remain in the 1900s).  You only need to include the bday_yy variable for the records with dates (birthdays) in the 2000s.
  2. This strategy does not require that you change the column locations for your existing variables as long as you place the century variable(s) to the right of all existing data.  Note that bday_yy comes to the right of all other variables.
  3. The replace ... if statement checks to see if the century (i.e., bday_yy) is 20.  For those records where bday_yy is 20, the value bdate is replaced with a date which assumes the century of birth is 20.
  4. Note that this example contained only one date variable.  Your data file may contain more than one date variable, so you may need to have a century variable for each of the dates in your data file.  You would then need a separate replace ... if statement corresponding to each of the date variables.
  5. Note that the data is read using a data dictionary.  This is chosen because a data dictionary treats each line of data as a record in a Stata data file, so when bday_yy is missing, it is assigned as a missing value.  Other strategies for reading data may skip to the next record when bday_yy is missing.

Conclusion

These examples illustrate some of the problems which will arise when using Stata to process dates for the Year 2000 and beyond.  For more information, please see the links on our Statistical Computing and the Year 2000.  For assistance solving Year 2000 problems in Statistical Computing, feel free to use the Statistical Consulting Services provided by the UCLA Academic Technology Services.


UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.