Stata FAQ
How can I turn a string variable containing dates into a date variable Stata can recognize?

Converting dates entered as strings into numeric dates that can be used by Stata is relatively simple. The commands vary somewhat based on the format in which the data were entered and how consistently that format was applied. In most cases, Stata can handle dates in which the days were entered as one or two digit numbers, and in which the month was entered as either abbreviation (e.g. Jan, Sept, Sep), or typed out in whole (e.g. January, September). The one place you can run into trouble is with the year. In general, Stata expects to find four-digit values for year, if your data were not entered this way (or not consistently entered this way), Stata can handle this, but some additions are needed to the syntax. The following set of examples covers a number of common scenarios. In general the examples move from the simplest case (where values for year are all four-digits), to more complex cases (where the values for year were not consistently entered as four-digit values).

Case 1: Dates with four-digit values for year.


input str10 date
20jan2007 
06sept2001
21june2004
04july2006
6aug2005
end

list

     +------------+
     |       date |
     |------------|
  1. |  20jan2007 |
  2. | 06sept2001 |
  3. | 21june2004 |
  4. | 04july2006 |
  5. |   6aug2005 |
     +------------+

Two commands are used to convert a string date into a numeric date. The code is slightly different depending on which version of Stata you are using. The difference is in how the pattern is specified. In Stata 9 it should be lower case (e.g. "dmy") and in Stata 10, it should be upper case for day, month, and year (e.g. "DMY") but lower case if you want to specify hours, minutes or seconds (e.g. "DMYhms"). In the syntax below, the first command actually generates the new date variable and converts the string into the proper numeric code for the date in question (i.e. the number of days elapsed between January 1st, 1960 and the date in question). Our data are in the order day, month, year, so we use "dmy" (or "DMY") within the date(...) command (see below for information about different formats). The second command formats the numeric value so that when Stata displays the date, it is in a form that is easy for humans to read. Note that while using list, and some other data display commands, produces values that look like string dates, the actual values stored by Stata are numeric.

In Stata version 9 the commands are:

gen date2 = date(date, "dmy")
format date2 %td

In Stata version 10 the commands are:

gen date2 = date(date, "DMY")
format date2 %td

In either version, if you list the data, you should see the following:

     +------------------------+
     |       date       date2 |
     |------------------------|
  1. |  20jan2007   20jan2007 |
  2. | 06sept2001   06sep2001 |
  3. | 21june2004   21jun2004 |
  4. | 04july2006   04jul2006 |
  5. |   6aug2005   06aug2005 |
     +------------------------+

Above, our date string was formatted day, month, year, with no character separating the different components of our date, however, Stata can handle a wide variety of formats and separating characters. For example, our data could have been listed "20-jan-2007", "20/jan/2007", or "20.jan.2007" and the commands above would work without any changes. Stata will even handle dates the case in which some entries have separators and others do not.  We can also change the order in which the components of the date are listed, all we will need to change is the order of the letters used to tell Stata which component of the date is where (e.g. year, month, day can be specified using "ymd" or "YMD"). Dates can also be all numeric, for example, 20-Jan-2007 can also be represented by 20-1-2007 and in Stata 10 even 20012007. Note that if all numeric dates are entered without separators, you must make sure that for the day and month single digit values (e.g. the 1 for January above) have leading zeros.

Case 2: Dates with two-digit and four-digit values for year, with all two-digit dates in the same century.

The above example assumed that the values for year were listed in a four digit format (e.g. 2004), however, this is not always the case. The data below contains dates listed with only two digits, assuming that I know these dates came either from the 20th (e.g. 1907) or 21st century (e.g. 2007), I can tell Stata to treat the years as though they are four digit values. To do this I add either "19" or "20" in front of the Y in the part of the command that gives Stata the order in which the values appear. For example, for dates listed day, month, year, the command is "dm20y" or "DM20Y" (for Stata versions 9 and 10 respectively). Notice that the last two dates both contain four-digit years (2005 and 1999), this is fine, the only restriction is that the entries with two-digit values for year all come from the same century.

input str14 date
20jan07 
06sept01
21june04
04july06
6august05
6august2005
20june1999
end

list

     +-------------+
     |        date |
     |-------------|
  1. |     20jan07 |
  2. |    06sept01 |
  3. |    21june04 |
  4. |    04july06 |
  5. |   6august05 |
     |-------------|
  6. | 6august2005 |
  7. |  20june1999 |
     +-------------+

For Stata version 9 the command to create a new numeric date variable from a string variable and format the new variable so it displays as a date are:

gen date2 = date(date, "dm20y")
format date2 %td

For Stata version 10 the command to create a new numeric date variable from a string variable and format the new variable so it displays as a date are:

gen date2 = date(date, "DM20Y")
format date2 %td

Note that for dates in the 20th century (e.g. 1997) the command would include "dm19y" or "DM19Y" (for Stata versions 9 and 10 respectively).

In either version of Stata, if you list the data, you should see the following:

list date date2

     +-------------------------+
     |        date       date2 |
     |-------------------------|
  1. |     20jan07   20jan2007 |
  2. |    06sept01   06sep2001 |
  3. |    21june04   21jun2004 |
  4. |    04july06   04jul2006 |
  5. |   6august05   06aug2005 |
     |-------------------------|
  6. | 6august2005   06aug2005 |
  7. |  20june1999   20jun1999 |
     +-------------------------+

As with the first example, the dates can be formatted in a variety of ways, order of the components does not mater, but the 19 or 20 always precedes the Y.

Case 3: Dates with two-digit and four-digit values for year, all two-digit dates are not in the same century or not from the 20th or 21st century.

In case 2 we assumed that all the dates with two-digit values for year were from the same century, and that all of the dates with two-digit values for year fall in either the 20th or 21st centuries. In this example, we will tell Stata the most recent year in which the data could appear, and Stata will assign the centuries so that all dates occur before that year. An example of when this be useful is if I know the values below are birth dates for currently living individuals. Since they are currently living, I know that none of them was born after the current year (2007), and it is unlikely that the value "07" in the year is 1907 rather than 2007 (note that this sort of information is highly dependent on the data, so you will want to check any assumptions you make against what you actually know about your data). This would also come in handy if your 2-digit year values all come from an earlier century. For example, if all your dates are from the 1700s, giving Stata 1799 as the last possible year will result in all two digit dates begin assigned to the eighteenth century (i.e. 84 would be interpreted as 1784).

input str18 date
20jan2007 
06sept1985
21june04
4july90
9jan1999
6aug99
19august2003
end

list

     +--------------+
     |         date |
     |--------------|
  1. |    20jan2007 |
  2. |   06sept1985 |
  3. |     21june04 |
  4. |      4july90 |
  5. |     9jan1999 |
     |--------------|
  6. |       6aug99 |
  7. | 19august2003 |
     +--------------+

For Stata version 9 the command to create a new numeric date variable from a string variable and format the new variable so it displays as a date are:

gen date2 = date(date, "dmy", 2007)
format date2 %td

For Stata version 10 the command to create a new numeric date variable from a string variable and format the new variable so it displays as a date are:

gen date2 = date(date, "DMY", 2007)
format date2 %td

In either version of Stata, if you list the data you should see the following:

list date date2

     +-------------------------+
     |        date       date2 |
     |-------------------------|
  1. |     20jan07   20jan2007 |
  2. |    06sept01   06sep2001 |
  3. |    21june04   21jun2004 |
  4. |    04july06   04jul2006 |
  5. |   6august05   06aug2005 |
     |-------------------------|
  6. | 6august2005   06aug2005 |
  7. |  20june1999   20jun1999 |
     +-------------------------+

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.