|
|
|
||||
|
|
|||||
The SAS system can handle all Gregorian calendar dates between the years 1582 and 20,000. However, as these examples will illustrate, your data and SAS programs may need some mending to be prepared for years 2000 and beyond. This page contains example programs focusing on 2 main problems, data files which use 2 digits to specify the year (e.g., 12/25/98) and displaying dates using only 2 digits for the year (e.g., 25DEC98).
Imagine that it is the summer of 2001 and you would like to create a very simple SAS data file containing the names and birthdays of your friends. The names and birthdates of your friends are...
Noel was born December 25, 1903 (Christmas Day)
Hank was born February 29, 1956 (A leap year)
Mary was born December 31, 1999 (New Year's Eve, before the year 2000)
Eric was born January 1, 2000 (Near Year's Day, year 2000)
Jane was born July 4, 2001 (Born on the 4th of July)
Ok, so it is a short list, and a couple of your friends (Mary, Eric and Jane) are a little bit young, but this list will help demonstrate problems which can arise when dealing with dates in the Year 2000 and beyond. We start with an example where everything is fine, where the data uses 4 digits to indicate the year of birth, and the date of birth is displayed using 4 digits for the year. If your data files and programs are like this example, your data and SAS programs may be fully ready for the Year 2000.
DATA friends;
INPUT @1 name $ 4.
@6 bday mmddyy10.;
CARDS;
Noel 12/25/1903
Hank 02/29/1956
Mary 12/31/1999
Eric 01/01/2000
Jane 07/04/2001
;
RUN;PROC PRINT DATA=friends;
VAR name bday;
FORMAT bday date9.;
RUN;
Output From Example 1.
OBS NAME BDAY
1 Noel 25DEC1903
2 Hank 29FEB1956
3 Mary 31DEC1999
4 Eric 01JAN2000
5 Jane 04JUL2001
In Example 1 above, a small SAS data file called friends is created containing the names and birthdates of your friends, and then PROC PRINT is used to display their names and birthdates. This example has two nice features.
By using 4 digit years to store the birthdays, and by displaying the birthdates using 4 digit years, this program is ready for the Year 2000. In fact, this program is ready for the year 2100, 2200, all the way up to the year 9999. However, if a different SAS FORMAT is used for displaying the data, you may not be sure when some of your friends were born, as shown in Example 2 below.
DATA friends;
INPUT @1 name $ 4.
@6 bday mmddyy10.;
CARDS;
Noel 12/25/1903
Hank 02/29/1956
Mary 12/31/1999
Eric 01/01/2000
Jane 07/04/2001
;
RUN;PROC PRINT DATA=friends;
VAR name bday;
FORMAT bday date.;
RUN;
Output From Example 2.
OBS NAME BDAY
1 Noel 25DEC03
2 Hank 29FEB56
3 Mary 31DEC99
4 Eric 01JAN00
5 Jane 04JUL01
Example 2 demonstrates a problem of using a print format (DATE.) which is not appropriate for 4 digit years. Example 2 is just like Example 1, except that the DATE. format is used to display the birthdays. As you can see in the output, the DATE. format displays 2 digit years for the birthdays, whereas Example 1 used the DATE9. format which displayed all 4 digits of the birthdays. This shows that even when you properly read dates into SAS using 4 digit years, you also need to use the DATE9. format to display those dates using 4 digit years (as shown in Example 1.) This problem is trivial compared to the problems which arise if only 2 digits are used to specify the year of birth, as shown in Example 3 below.
DATA friends;
INPUT @1 name $ 4.
@6 bday mmddyy8.;
CARDS;
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00
Jane 07/04/01
;
RUN;PROC PRINT DATA=friends;
VAR name bday;
FORMAT bday date9.;
RUN;
Output From Example 3.
OBS NAME BDAY
1 Noel 25DEC1903
2 Hank 29FEB1956
3 Mary 31DEC1999
4 Eric 01JAN1900
5 Jane 04JUL1901
Example 3 demonstrates a problem of using inputting dates using only a 2 digit year. For example, Eric was born on Jan 1, 2000 but his birthday is input as 01/01/00. With a 2 digit date like this, it is ASSUMED that the century portion is 19 (See Footnote 1) . As you can see in the output, Eric is incorrectly assigned a birthday of Jan 1, 1900. In this small example we could enter the data all over again using 4 digit dates, however you may have data files with hundreds, thousands, or millions of records using dates with 2 digit years. Example 4, shown below, illustrates a possible solution to this problem by telling SAS when to treat a birthday as coming from the 1900s and when to treat a birthday as coming from the 2000s.
OPTIONS YEARCUTOFF=1903;
DATA friends;
INPUT @1 name $ 4.
@6 bday mmddyy8.;
CARDS;
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00
Jane 07/04/01
;
RUN;PROC PRINT DATA=friends;
VAR name bday;
FORMAT bday date9.;
RUN;
Output From Example 4.
OBS NAME BDAY
1 Noel 25DEC1903
2 Hank 29FEB1956
3 Mary 31DEC1999
4 Eric 01JAN2000
5 Jane 04JUL2001
Example 4 demonstrates use of the YEARCUTOFF= option to solve the problem posed in Example 3, how to deal with dates which have 2 digit years. The OPTIONS YEARCUTOFF=1903; statement instructs SAS to prefix years 03-99 with a 19 (treating them as 1903-1999), but to prefix years 00-02 with a 20 (i.e. 2000-2002). As you can see, this seems to have mended our problem with the birthdays using 2 digit years. Eric and Jane are now properly understood to have a birthday in the years 2000 and 2001 respectively. However, Example 5 below shows a major weakness in this strategy, when a 2 digit year could mean 19xx or 20xx.
OPTIONS YEARCUTOFF=1903;
DATA friends;
INPUT @1 name $ 4.
@6 bday mmddyy8.;
CARDS;
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00
Jane 07/04/01
Will 10/31/03
;
RUN;PROC PRINT DATA=friends;
VAR name bday;
FORMAT bday date9.;
RUN;
Output From Example 5.
OBS NAME BDAY
1 Noel 25DEC1903
2 Hank 29FEB1956
3 Mary 31DEC1999
4 Eric 01JAN2000
5 Jane 04JUL2001
6 Will 10OCT1903
Example 5 demonstrates the major weakness of using the YEARCUTOFF= option to try and solve problems with 2 digit years. It is now Winter 2003 and you have a new friend, Will born on October 31, 2003 (Halloween 2003). As you can see, the program interprets 10/31/03 as 10/31/1903 because of the YEARCUTOFF=1903; option. If you slide the yearcutoff any higher (e.g., OPTIONS YEARCUTOFF=1904;) then NOEL will be treated as being born in 2003 instead of 1903. As you can see, YEARCUTOFF= option only solves this problem in a limited way.
The YEARCUTOFF= option is only useful if you can clearly specify a cutoff year which divides years which should be treated as 19xx from years which should be treated as 20xx. However, when this line becomes blurred, this solution fails. You can permanently solve your problem by revising your data file to use 4 digit years (e.g., as shown in Example 1), but this could be very costly and time consuming, requiring you to entirely restructure your data files and shift column locations for all other variables. Example 6 shows a compromise solution by using a new variable to indicate the century portion of the date.
OPTIONS YEARCUTOFF=1900;
DATA friends;
INPUT @1 name $ 4.
@6 bday mmddyy8.
@15 bday_yy 2.;IF (bday_yy EQ 20) Then bday = MDY( MONTH(bday), DAY(bday), YEAR(bday) + 100);
CARDS;
Noel 12/25/03
Hank 02/29/56
Mary 12/31/99
Eric 01/01/00 20
Jane 07/04/01 20
Will 10/31/03 20
;
RUN;PROC PRINT DATA=friends;
VAR name bday;
FORMAT bday date9.;
RUN;
Output From Example 6.
OBS NAME BDAY
1 Noel 25DEC1903
2 Hank 29FEB1956
3 Mary 31DEC1999
4 Eric 01JAN2000
5 Jane 04JUL2001
6 Will 31OCT2003
Example 6 solves the problem of the dates with 2 digit years by creating a separate variable indicating the century portion of the date. As you can see in the output, everyone is correctly assigned the proper birthdate because the data EXPLICITLY indicates which birthdates should have a 20 prefixed to the year (using the bday_yy variable). Here are some important points about this program.
These examples illustrate some of the problems which will arise when using SAS to process dates for the Year 2000 and beyond. For more information, please see the links on our Statistical Computing and the Year 2000 page. For assistance solving Year 2000 problems in Statistical Computing, feel free to use the Statistical Consulting Services provided by the UCLA Academic Technology Services.
1. Please note that in version 6.xx of SAS, the default value of the YEARCUTOFF= option is 1900, meaning that all 2 digit dates are prefixed with a century of 19 (e.g. 1/1/84 is converted into the date Jan 1, 1984). This default value for the YEARCUTOFF option will change to YEARCUTOFF=1920 in SAS Version 7. This means that a date with the year portion between 00 to 19 will be treated as having 20 for the century portion, (e.g. 1/1/15 will be converted into the date Jan 1, 2015). Any functions which rely on 2 digit years from 00-99 being treated as 1900-1999 will encounter problems with SAS Version 7. To avoid this problem, put OPTIONS YEARCUTOFF=1900; at the top of your program, which explicitly state that you want the YEARCUTOFF=1900 and your program should work the same in Version 6 and in Version 7.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services