|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
Date variables can pose a challenge in data management. This is true in any package and different packages handle date values differently. This page aims to provide an overview of dates in R--how to format them, how they are stored, and what functions are available for analyzing them.
For a date variable stored as a vector of strings, see R FAQ: How can I format a string containing a date in R "Date" object?.
For a date variable stored as a vector of numbers, there is a little bit of detective work to be done. Look at the few of the numbers and see if there's a clear pattern. If the numeric values are actually month, day, and year values concatenated without separation, like 20011010 for October 10, 2001, then these values should be converted to character strings (using as.character) and then formatted using the tips in the link above.
If the numeric values are counting the days that have passed since some starting date, then the as.Date function can be used with an origin date indicated. Excel dates, when converted to integers, are counting from January 1, 1900. We can indicate this as the origin date in as.Date.
edates <- c(22053, 33982, 40274) as.Date(edates, origin = "1900-01-01") [1] "1960-05-19" "1993-01-15" "2010-04-08"
Other packages store dates using different origins. SAS, for example uses 1960 rather than 1900. When R looks at dates as integers, its origin is January 1, 1970.
as.numeric(as.Date(edates, origin = "1900-01-01")) [1] -3514 8415 14707 startdate <- "1970-01-01" as.numeric(as.Date(startdate)) [1] 0
Date objects are stored in R as integer values, allowing for dates to be compared and manipulated as you would a numeric vector. Logical comparisons are a simple. When referring to dates, earlier dates are "less than" later dates. Returning to our example above, we can compare the three dates in edates to January 1, 1970. For dates prior to this, the comparison should return TRUE. For later dates, the comparison should return FALSE.
as.Date(edates, origin = "1900-01-01") < "1970-01-01" [1] TRUE FALSE FALSE
Adding a week to dates can be done by simply adding 7. The date format is maintained.
weeklater <- as.Date(edates, origin = "1900-01-01") + 7 weeklater [1] "1960-05-26" "1993-01-22" "2010-04-15" class(weeklater) [1] "Date"
There are several functions in R specific to Date objects or for creating Date objects. The Sys.Date() function generates the value of the current date. It is easy to extract the day of the week and the month.
weekdays(weeklater) [1] "Thursday" "Friday" "Thursday" months(weeklater) [1] "May" "January" "April"
If you are interested in the distribution of a date variable, there are plotting functions available. Below, we randomly sample 100 dates in a year and then plot a histogram with one bar per month.
rdates <- as.Date("2010/1/1") + floor(365*runif(100))
hist(rdates, "months", format = "%d %b")
This has been a very quick overview. R also has date-time objects and functions specific to date-time data and more far functions than were shown here. The documentation pages for Dates and DateTimeClasses, both in base R, provide more details.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services