UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

NOTE: This page has been delinked.  It is no longer being maintained, and information on this page may be out of date.

Statistical Computing and the Year 2000 (the y2k problem)

Using Dates in SAS in the Year 2000 and Beyond
(Handling the Y2K Problem in SAS)

The SAS system can handle all Gregorian calendar dates between the years 1582 and 20,000.  However, as these examples will illustrate, your data and SAS programs may need some mending to be prepared for years 2000 and beyond.  This page contains example programs focusing on 2 main problems, data files which use 2 digits to specify the year (e.g., 12/25/98) and displaying dates using only 2 digits for the year (e.g., 25DEC98).


The Example Data File

Imagine that it is the summer of 2001 and you would like to create a very simple SAS data file containing the names and birthdays of your friends.  The names and birthdates of your friends are...

Noel was born December 25, 1903 (Christmas Day)
Hank was born February 29, 1956 (A leap year)
Mary was born December 31, 1999 (New Year's Eve, before the year 2000)
Eric was born  January  1, 2000 (Near Year's Day, year 2000)
Jane was born      July 4, 2001 (Born on the 4th of July)

Ok, so it is a short list, and a couple of your friends (Mary, Eric and Jane) are a little bit young, but this list will help demonstrate problems which can arise when dealing with dates in the Year 2000 and beyond.  We start with an example where everything is fine, where the data uses 4 digits to indicate the year of birth, and the date of birth is displayed using 4 digits for the year.  If your data files and programs are like this example, your data and SAS programs may be fully ready for the Year 2000.


Example 1.  Everything is Fine.

DATA friends;
  INPUT @1 name $ 4.
        @6 bday mmddyy10.;
CARDS;
Noel 12/25/1903
Hank 02/29/1956
Mary 12/31/1999
Eric 01/01/2000
Jane 07/04/2001
;
RUN;

PROC PRINT DATA=friends;
  VAR name bday;
  FORMAT bday date9.;
RUN;

Output From Example 1.

OBS    NAME         BDAY

 1     Noel    25DEC1903
 2     Hank    29FEB1956
 3     Mary    31DEC1999
 4     Eric    01JAN2000
 5     Jane    04JUL2001


In Example 1 above, a small SAS data file called friends is created containing the names and birthdates of your friends, and then PROC PRINT is used to display their names and birthdates.  This example has two nice features.

  1. The birthdays are indicated using 4 digit years (e.g. 12/25/1903, NOT 12/25/03)
  2. The PROC PRINT uses the DATE9. print format to make sure that the birthdays are displayed with 4 digit years.

By using 4 digit years to store the birthdays, and by displaying the birthdates using 4 digit years, this program is ready for the Year 2000.  In fact, this program is ready for the year 2100, 2200, all the way up to the year 9999. However, if a different SAS FORMAT is used for displaying the data, you may not be sure when some of your friends were born, as shown in Example 2 below.


Example 2.  Displaying Dates Using 2 Digit Years.

DATA friends;
  INPUT @1 name $ 4.
        @6 bday mmddyy10.;
CARDS;
Noel 12/25/1903
Hank 02/29/1956
Mary 12/31/1999
Eric 01/01/2000
Jane 07/04/2001
;
RUN;

PROC PRINT DATA=friends;
  VAR name bday;
  FORMAT bday date.;
RUN;

Output From Example 2.

OBS    NAME         BDAY

 1     Noel    25DEC03
 2     Hank    29FEB56
 3     Mary    31DEC99
 4     Eric    01JAN00
 5     Jane    04JUL01


Example 2 demonstrates a problem of using a print format (DATE.) which is not appropriate for 4 digit years.  Example 2 is just like Example 1, except that the DATE. format is used to display the birthdays.  As you can see in the output, the DATE. format displays 2 digit years for the birthdays, whereas Example 1 used the DATE9. format which displayed all 4 digits of the birthdays.  This shows that even when you properly read dates into SAS using 4 digit years, you also need to use the DATE9. format to display those dates using 4 digit years (as shown in Example 1.)  This problem is trivial compared to the problems which arise if only 2 digits are used to specify the year of birth, as shown in Example 3 below.


Example 3.  Inputting Dates Using 2 Digit Years.

DATA friends;
  INPUT @1 name $ 4.
        @6 bday mmddyy8.;
CARDS;
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00
Jane 07/04/01
;
RUN;

PROC PRINT DATA=friends;
  VAR name bday;
  FORMAT bday date9.;
RUN;

Output From Example 3.

OBS    NAME         BDAY

 1     Noel    25DEC1903
 2     Hank    29FEB1956
 3     Mary    31DEC1999
 4     Eric    01JAN1900
 5     Jane    04JUL1901


Example 3 demonstrates a problem of using inputting dates using only a 2 digit year.  For example, Eric was born on Jan 1, 2000 but his birthday is input as 01/01/00.  With a 2 digit date like this, it is ASSUMED that the century portion is 19 (See Footnote 1) .  As you can see in the output, Eric is incorrectly assigned a birthday of  Jan 1, 1900.  In this small example we could enter the data all over again using 4 digit dates, however you may have data files with hundreds, thousands, or millions of records using dates with 2 digit years.  Example 4, shown below, illustrates a possible solution to this problem by telling SAS when to treat a birthday as coming from the 1900s and when to treat a birthday as coming from the 2000s.


Example 4.  Inputting Dates Using 2 Digit Years Using the YEARCUTTOFF= Option.

OPTIONS YEARCUTOFF=1903;

DATA friends;
  INPUT @1 name $ 4.
        @6 bday mmddyy8.;
CARDS;
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00
Jane 07/04/01
;
RUN;

PROC PRINT DATA=friends;
  VAR name bday;
  FORMAT bday date9.;
RUN;

Output From Example 4.

OBS    NAME         BDAY

 1     Noel    25DEC1903
 2     Hank    29FEB1956
 3     Mary    31DEC1999
 4     Eric    01JAN2000
 5     Jane    04JUL2001


Example 4 demonstrates use of the YEARCUTOFF=  option to solve the problem posed in Example 3, how to deal with dates which have 2 digit years.  The OPTIONS YEARCUTOFF=1903; statement instructs SAS to prefix years 03-99 with a 19 (treating them as 1903-1999), but to prefix years 00-02 with a 20 (i.e. 2000-2002).  As you can see, this seems to have mended our problem with the birthdays using 2 digit years.  Eric and Jane are now properly understood to have a birthday in the years 2000 and 2001 respectively.  However, Example 5 below shows a major weakness in this strategy, when a 2 digit year could mean 19xx or 20xx.


Example 5.  Problems Inputting Dates Using 2 Digit Years Using the YEARCUTTOFF= Option.

OPTIONS YEARCUTOFF=1903;

DATA friends;
  INPUT @1 name $ 4.
        @6 bday mmddyy8.;
CARDS;
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00
Jane 07/04/01
Will 10/31/03
;
RUN;

PROC PRINT DATA=friends;
  VAR name bday;
  FORMAT bday date9.;
RUN;

Output From Example 5.

OBS    NAME         BDAY

 1     Noel    25DEC1903
 2     Hank    29FEB1956
 3     Mary    31DEC1999
 4     Eric    01JAN2000
 5     Jane    04JUL2001
 6     Will    10OCT1903


Example 5 demonstrates the major weakness of using the YEARCUTOFF= option to try and solve problems with 2 digit years. It is now Winter 2003 and you have a new friend, Will born on October 31,  2003 (Halloween  2003).  As you can see, the program interprets 10/31/03 as 10/31/1903 because of the YEARCUTOFF=1903; option.  If you slide the yearcutoff any higher (e.g., OPTIONS YEARCUTOFF=1904;) then NOEL will be treated as being born in 2003 instead of 1903.  As you can see,  YEARCUTOFF= option only solves this problem in a limited way.

The YEARCUTOFF= option is only useful if you can clearly specify a cutoff year which divides years which should be treated as 19xx from years which should be treated as 20xx.  However, when this line becomes blurred, this solution fails. You can permanently solve your problem by revising your data file to use 4 digit years (e.g., as shown in Example 1), but this could be very costly and time consuming, requiring you to entirely restructure your data files and shift column locations for all other variables.  Example 6 shows a compromise solution by using a new variable to indicate the century portion of the date.



Example 6.  Inputting Dates Using 2 Digit Years Using an Additional Century Variable.

OPTIONS YEARCUTOFF=1900;

DATA friends;
  INPUT @1 name $ 4.
        @6 bday mmddyy8.
        @15 bday_yy 2.;

IF (bday_yy EQ 20) Then bday = MDY( MONTH(bday), DAY(bday), YEAR(bday) + 100);

CARDS;
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00 20
Jane 07/04/01 20
Will 10/31/03 20
;
RUN;

PROC PRINT DATA=friends;
  VAR name bday;
  FORMAT bday date9.;
RUN;

Output From Example 6.

OBS    NAME         BDAY

 1     Noel    25DEC1903
 2     Hank    29FEB1956
 3     Mary    31DEC1999
 4     Eric    01JAN2000
 5     Jane    04JUL2001
 6     Will    31OCT2003


Example 6 solves the problem of the dates with 2 digit years by creating a separate variable indicating the century portion of the date.  As you can see in the output, everyone is correctly assigned the proper birthdate because the data EXPLICITLY indicates which birthdates should have a 20 prefixed to the year (using the bday_yy variable).  Here are some important points about this program.

  1. This strategy does not require that you change your existing data (while dates remain in the 1900s).  You only need to include the bday_yy variable for the records with dates (birthdays) in the 2000s.
  2. This strategy does not require that you change the column locations for your existing variables as long as you place the century variable(s) to the right of all existing data.  Note that bday_yy comes to the right of all other variables.
  3. If your INPUT statement uses list input (e.g., INPUT name bday bday_yy; ) you probably should use the the MISSOVER option on your INFILE statement.  This assures that bday_yy is assigned a value of missing for those records where there is no data for bday_yy.  For more information, see the SAS Language manual about the INFILE statement, the MISSOVER option on the INFILE statement, and the INPUT statement.
  4. The IF statement checks to see if the century (i.e., bday_yy) is 20.  If bday_yy is 20, the MONTH, DAY and YEAR functions are used to  extract the Month and Day and Year, 100 is added to the year to move it from 19xx to 20xx.  Finally, the MDY function is used to combine the Month, Day, and modified Year to create the birthdate with the year 20xx.
  5. Note that the OPTIONS YEARCUTOFF=1900; statement was used.  Even though this option is redundant in Version 6 of SAS, it will become necessary in version 7 of SAS when the default YEARCUTOFF= will become 1920.  Consider what would happen to this program if the YEARCUTOFF= option were omitted and the program were run under version 7.  For Will, his birthdate of 10/31/03 would be read by SAS as 10/31/2003 (because 03 falls below the YEARCUTOFF so it would implicitly advance the year to 2003).  The IF statement would see that the century (bday_yy) is 20, and advance the year by 100 again, yielding a birthdate of 2103.  Put most simply, be absolutely sure to include the OPTIONS YEARCUTOFF=1900; statement if you plan to use this strategy so your program will continue to work as expected in future versions of SAS.
  6. Note that this example contained only one date variable.  Your data file may contain more than one date variable, so you may need to have a century variable for each of the dates in your data file.  You would then need a separate IF statement corresponding to each of the date variables.

Conclusion

These examples illustrate some of the problems which will arise when using SAS to process dates for the Year 2000 and beyond.  For more information, please see the links on our Statistical Computing and the Year 2000 page.  For assistance solving Year 2000 problems in Statistical Computing, feel free to use the Statistical Consulting Services provided by the UCLA Academic Technology Services.


Footnotes

1. Please note that in version 6.xx of SAS, the default value of the  YEARCUTOFF=  option is 1900, meaning that all 2 digit dates are prefixed with a century of 19 (e.g. 1/1/84 is converted into the date Jan 1, 1984).  This default value for the YEARCUTOFF option will change to YEARCUTOFF=1920 in SAS Version 7.  This means that a date with the year portion between 00 to 19 will be treated as having 20 for the century portion, (e.g. 1/1/15 will be converted into the date Jan 1, 2015). Any functions which rely on 2 digit years from 00-99 being treated as 1900-1999 will encounter problems with SAS Version 7.  To avoid this problem, put OPTIONS YEARCUTOFF=1900; at the top of your program, which explicitly state that you want the YEARCUTOFF=1900 and your program should work the same in Version 6 and in Version 7.


UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.