### Stata Learning Module Using dates in Stata

This module will show how to use date variables, date functions, and date display formats in Stata.

#### Converting dates from raw data using the "date()" function

The trick to inputting dates in Stata is to forget they are dates, and treat them as character strings, and then later convert them into a Stata date variable. You might have the following date data in your raw data file.

type dates1.raw
John  1 Jan 1960
Mary 11 Jul 1955
Kate 12 Nov 1962
Mark  8 Jun 1959 

You can read these data by typing:

infix str name 1-4 str bday 6-17 using dates1.raw
 (4 observations read)

Using the list command, you can see that the date information has been read correctly into bday.

list
          name          bday
1.      John    1 Jan 1960
2.      Mary   11 Jul 1955
3.      Kate   12 Nov 1962
4.      Mark    8 Jun 1959   

Since bday is a string variable, you cannot do any kind of date computations with it until you make a date variable from it. You can generate a date version of bday using the date() function. The example below creates a date variable called birthday from the character variable bday. The syntax is slightly different depending on which version of Stata you are using. The difference is in how the pattern is specified. In Stata 9 it should be lower case (e.g., "dmy") and in Stata 10, it should be upper case for day, month, and year (e.g., "DMY") but lower case if you want to specify hours, minutes or seconds (e.g., "DMYhms"). Our data are in the order day, month, year, so we use "DMY" (or "dmy" if you are using Stata 9) within the date() command. (Unless otherwise noted, all other Stata commands on this page are the same for versions 9 and 10.)

In Stata version 9:

generate birthday=date(bday,"dmy")

In Stata version 10:

generate birthday=date(bday,"DMY")

Let's have a look at both bday and birthday.

list
          name          bday   birthday
1.      John    1 Jan 1960          0
2.      Mary   11 Jul 1955      -1635
3.      Kate   12 Nov 1962       1046
4.      Mark    8 Jun 1959       -207   

The values for birthday may seem confusing. The value of birthday for John is 0 and the value of birthday for Mark is -207. Dates are actually stored as the number of days from Jan 1, 1960 which is convenient for the computer storing and performing date computations, but is difficult for you and I to read.

We can tell Stata that birthday should be displayed using the %d format to make it easier for humans to read.

format birthday %d
list
name          bday   birthday
1.      John    1 Jan 1960  01jan1960
2.      Mary   11 Jul 1955  11jul1955
3.      Kate   12 Nov 1962  12nov1962
4.      Mark    8 Jun 1959  08jun1959   

The date() function is very flexible and can handle dates written in almost any manner. For example, consider the file dates2.raw.

type dates2.raw
John Jan 1 1960
Mary 07/11/1955
Kate 11.12.1962
Mark Jun/8 1959 

These dates are messy, but they are consistent. Even though the formats look different, it is always a month day year separated by a delimiter (e.g., space slash dot or dash). We can try using the syntax from above to read in our new dates. Note that, as discussed above, for Stata version 10 the order of the date is declared in upper case letters (i.e., "MDY") while for version 9 it is declared in all lower case (i.e., "mdy").

clear
infix str name 1-4 str bday 6-17 using dates2.raw

generate birthday=date(bday,"MDY")

format birthday %d
list
name          bday   birthday
1.      John    Jan 1 1960  01jan1960
2.      Mary    07/11/1955  11jul1955
3.      Kate    11.12.1962  12nov1962
4.      Mark    Jun/8 1959  08jun1959   

Stata was able to read those dates without a problem. Let's try an even tougher set of dates. For example, consider the dates in dates3.raw.

type dates3.raw
4-12-1990
4.12.1990
Apr 12, 1990
Apr12,1990
April 12, 1990
4/12.1990
Apr121990 

Let's try reading these dates and see how Stata handles them. Again, remember that for Stata version 10 dates are declared "MDY" while for version 9 they are declared "mdy".

clear
infix str bday 1-20 using dates3.raw
 (7 observations read)
generate birthday=date(bday,"MDY")
 (1 missing value generated)
format birthday %d
list
                     bday   birthday
1.            4-12-1990  12apr1990
2.            4.12.1990  12apr1990
3.         Apr 12, 1990  12apr1990
4.           Apr12,1990  12apr1990
5.       April 12, 1990  12apr1990
6.            4/12.1990  12apr1990
7.            Apr121990          .   

As you can see, Stata was able to handle almost all of those crazy date formats. It was able to handle Apr12,1990 even though there was not a delimiter between the month and day (Stata was able to figure it out since the month was character and the day was a number). The only date that did not work was Apr121990 and that is because there was no delimiter between the day and year. As you can see, the date() function can handle just about any date as long as there are delimiters separating the month day and year. In certain cases Stata can read all numeric dates entered without delimiters, see help dates for more information.

#### Converting dates from raw data using the mdy() function

In some cases, you may have the month, day, and year stored as numeric variables in a dataset. For example, you may have the following data for birth dates from dates4.raw.

type dates4.raw
 7 11 1948
1  1 1960
10 15 1970
12 10 1971 

You can read in this data using the following syntax to create a separate variable for month, day and year.

clear
infix month 1-2 day 4-5 year 7-10 using dates4.raw
 (4 observations read)
list
         month        day       year
1.         7         11       1948
2.         1          1       1960
3.        10         15       1970
4.        12         10       1971   

A Stata date variable can be created using the mdy() function as shown below.

generate birthday=mdy(month,day,year)

Let's format birthday using the %d format so it displays better.

format birthday %d
list
         month        day       year   birthday
1.         7         11       1948  11jul1948
2.         1          1       1960  01jan1960
3.        10         15       1970  15oct1970
4.        12         10       1971  10dec1971   

Consider the data in dates5.raw, which is the same as dates4.raw except that only two digits are used to signify the year.

type dates5.raw
 7 11 48
1  1 60
10 15 70
12 10 71 

clear
infix month 1-2 day 4-5 year 7-10 using dates5.raw
 (4 observations read)
generate birthday=mdy(month,day,year)
 (4 missing values generated)
format birthday %d
list
         month        day       year   birthday
1.         7         11         48          .
2.         1          1         60          .
3.        10         15         70          .
4.        12         10         71          .   

As you can see, the values for birthday are all missing. This is because Stata assumes that the years were literally 48, 60, 70 and 71 (it does not assume they are 1948, 1960, 1970 and 1971). You can force Stata to assume the century portion is 1900 by adding 1900 to the year as shown below (note that we use replace instead of generate since the variable birthday already exists).

replace birthday=mdy(month,day,year+1900)
 (4 real changes made)
format birthday %d
list
         month        day       year   birthday
1.         7         11         48  11jul1948
2.         1          1         60  01jan1960
3.        10         15         70  15oct1970
4.        12         10         71  10dec1971   

#### Computations with elapsed dates

Date variables make computations involving dates very convenient. For example, to calculate everyone's age on January 1, 2000 simply use the following conversion.

generate age2000=( mdy(1,1,2000) - birthday ) / 365.25
list
         month        day       year   birthday    age2000
1.         7         11         48  11jul1948   51.47433
2.         1          1         60  01jan1960         40
3.        10         15         70  15oct1970   29.21287
4.        12         10         71  10dec1971   28.06023   
Please note that this formula for age does not work well over very short time spans. For example, the age for a child on their his birthday will be less than one due to using 365.25. There are formulas that are more exact but also much more complex. Here is an example courtesy of Dan Blanchette.
generate altage = floor(([ym(2000, 1) - ym(year(birthday), month(birthday))] - [1 < day(birthday)]) / 12)

#### Other date functions

Given a date variable, one can have the month, day and year returned separately if desired, using the month(), day() and year() functions, respectively.

generate m=month(birthday)
generate d=day(birthday)
generate y=year(birthday)
list m d y birthday
             m          d          y   birthday
1.         7         11       1948  11jul1948
2.         1          1       1960  01jan1960
3.        10         15       1970  15oct1970
4.        12         10       1971  10dec1971   

If you'd like to return the day of the week for a date variable, use the dow() function (where 0=Sunday, 1=Monday etc.).

gen week_d=dow(birthday)
list birthday week_d
      birthday     week_d
1. 11jul1948          0
2. 01jan1960          5
3. 15oct1970          4
4. 10dec1971          5   

#### Summary

The date() function converts strings containing dates to date variables. The syntax varies slightly by version.

In Stata version 9:

gen date2 = date(date, "dmy")

In Stata version 10:

gen date2 = date(date, "DMY")

The mdy() function takes three numeric arguments (month, day, year) and converts them to a date variable.

generate birthday=mdy(month,day,year)
You can display elapsed times as actual dates with display formats such as the %d format.
format birthday %d
Other date functions include the month(), day(), year(), and dow() functions. For online help with dates, type help dates at the command line. For more detailed explanations about how Stata handles dates and date functions, please refer to the Stata Users Guide.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.