lubridate | stringr |
---|---|
1.7.9.2 | 1.4.0 |
Date values can be represented in tables as numbers or characters. But to be properly interpreted by R as dates, date values should be converted to an R date object class or a POSIXct
/POSIXt
object class. R provides many facilities to convert and manipulate dates and times, but a package called lubridate
makes working with dates/times much easier.
You can convert many representations of date and time to date objects. For example, let’s create a vector of dates represented as month/day/year character strings,
<- c("06/23/2013", "06/30/2013", "07/12/2014")
x class(x)
[1] "character"
At this point, R treats the vector x
as characters. To force R to interpret these as dates, use lubridate
’s mdy
function. mdy
will convert date strings where the date elements are ordered as month, day and year.
library(lubridate)
<- mdy(x)
x.date class(x.date)
[1] "Date"
If you need to specify the time zone, add the parameter tz=
. For example, to specify Eastern Standard Time, type:
<- mdy(x, tz="EST")
x.date x.date
[1] "2013-06-23 EST" "2013-06-30 EST" "2014-07-12 EST"
Note that using the mode
or typeof
functions will not help us determine if the object is an R date object. This is because a date is stored as a numeric
(double) internally. Use the class
function instead as shown in the above code chunk.
The mdy
function can read in date formats that use different delimiters so that mdy("06/23/2013")
, mdy("06-23-2013")
and mdy("06.23.2013")
are parsed exactly the same so long as the order remains month/day/year.
For different month/day/year arrangements, other lubridate
functions need to be used:
Functions | Date Format |
---|---|
dmy() |
day/month/year |
ymd() |
year/month/day |
ydm() |
year/day/month |
If your data contains both date and time in a “month/day/year hour:minutes:seconds” format use the mdy_hms
function.
<- c("06/23/2013 03:45:23", "07/30/2013 14:23:00", "08/12/2014 18:01:59")
x <- mdy_hms(x, tz="EST")
x.date x.date
[1] "2013-06-23 03:45:23 EST" "2013-07-30 14:23:00 EST" "2014-08-12 18:01:59 EST"
The characters _h
, _hm
or _hms
can be appended to any of the four date function names described earlier to accommodate time formats. A few examples follow:
mdy_h("6/23/2013 3", tz="EST")
[1] "2013-06-23 03:00:00 EST"
dmy_hm("23/6/2013 3:15", tz="EST")
[1] "2013-06-23 03:15:00 EST"
ymd_hms("2013/06/23 3:15:7", tz="EST")
[1] "2013-06-23 03:15:07 EST"
Note that adding a time element to the date object will create POSIXct
and POSIXt
object classes instead of Date
object classes.
class(x.date)
[1] "POSIXct" "POSIXt"
Also, if a timezone is not explicitly defined for a time based date, the function assigns UTC
( Universal Coordinated Time).
dmy_hm("23/6/2013 3:15")
[1] "2013-06-23 03:15:00 UTC"
R does not maintain its own list of timezone names, instead, it relies on the operating system’s naming convention. To list the supported timezone names for your particular R environment, type:
OlsonNames()
For example, to select Daylight Savings Time type tz = "EST5EDT"
.
<- mdy_hms(x, tz="EST5EDT")
x.date x.date
[1] "2013-06-23 03:45:23 EDT" "2013-07-30 14:23:00 EDT" "2014-08-12 18:01:59 EDT"
class(x.date)
[1] "POSIXct" "POSIXt"
If you need to convert the day/time to another timezone, use lubridate
’s with_tz()
function. For example, to convert x.date
from it’s current EST5DST
timezone to the US/Alaska
time zone, type:
with_tz(x.date, tzone = "US/Alaska")
[1] "2013-06-22 23:45:23 AKDT" "2013-07-30 10:23:00 AKDT" "2014-08-12 14:01:59 AKDT"
Note that the with_tz
function will change the timestamp to reflect the new time zone. If you simply want to change the time zone definition and not the timestamp, use the tz()
function.
tz(x.date) <- "US/Alaska"
x.date
[1] "2013-06-23 03:45:23 AKDT" "2013-07-30 14:23:00 AKDT" "2014-08-12 18:01:59 AKDT"
If your data table splits the date elements into separate vector objects or columns, use the paste
function to combine the elements into a single date string before passing it to one of the lubridate
functions. Let’s look at an example:
<- read.csv("http://mgimond.github.io/ES218/Data/CO2.csv")
dat1 head(dat1)
Year Month Average Interpolated Trend Daily_mean
1 1959 1 315.62 315.62 315.70 -1
2 1959 2 316.38 316.38 315.88 -1
3 1959 3 316.71 316.71 315.62 -1
4 1959 4 317.72 317.72 315.56 -1
5 1959 5 318.29 318.29 315.50 -1
6 1959 6 318.15 318.15 315.92 -1
The CO2 dataset has the date split across two columns: Year
and Month
(both stored as integers). You can combine the columns into a character string using the paste
function. For example, if we want to create a “Year-Month” string as in 1959-10
, we could type:
paste(dat1$Year,dat1$Month, sep="-")
The above example uses three arguments: the two objects that are pasted together (i.e. Year
and Month
) and the sep="-"
parameter which fills the gap between both objects with a dash -
(by default, paste
would have added spaces thus creating strings in the form of 1959 10
).
lubridate
does not have a function along the lines of ym
to convert just the year-month strings, this requires that we add an artificial day of the month to the string. We’ll choose to add the 15th day of the month as in
paste(dat1$Year, dat1$Month, "15", sep="-")
And finally, we’ll add a new column called Date
to the dat
object, and fill that column with the newly created date string wrapped with the ymd
function:
$Date <- ymd( paste(dat1$Year, dat1$Month, "15", sep="-") )
dat1head(dat1)
Year Month Average Interpolated Trend Daily_mean Date
1 1959 1 315.62 315.62 315.70 -1 1959-01-15
2 1959 2 316.38 316.38 315.88 -1 1959-02-15
3 1959 3 316.71 316.71 315.62 -1 1959-03-15
4 1959 4 317.72 317.72 315.56 -1 1959-04-15
5 1959 5 318.29 318.29 315.50 -1 1959-05-15
6 1959 6 318.15 318.15 315.92 -1 1959-06-15
The sep="-"
option is not needed with the lubridate function so the last piece of code could have been written as:
$Date <- ymd( paste(dat1$Year, dat1$Month, "15") ) dat1
To confirm that the Date
column is indeed formatted as a date object type:
str(dat1)
'data.frame': 721 obs. of 7 variables:
$ Year : int 1959 1959 1959 1959 1959 1959 1959 1959 1959 1959 ...
$ Month : int 1 2 3 4 5 6 7 8 9 10 ...
$ Average : num 316 316 317 318 318 ...
$ Interpolated: num 316 316 317 318 318 ...
$ Trend : num 316 316 316 316 316 ...
$ Daily_mean : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Date : Date, format: "1959-01-15" "1959-02-15" "1959-03-15" "1959-04-15" ...
or you could type,
class(dat1$Date)
[1] "Date"
Since we did not add a timezone or a time component to the date object the Date
column was assigned a Date
class as opposed to the POSIX...
class.
The lubridate functions may expect the time values to consist of a specific number of characters if a delimiter such as :
is not present to split the time elements. For example, the following will not generate a valide date/time object:
<- 712 # Time 7:12
hrmin <- "2018/03/17" # Date
date ymd_hm(paste(date, hrmin))
[1] NA
One solution is to pad the time element with 0’s to complete a four character vector (or a six character vector if seconds are part of the time element). We can use the str_pad
function from the stringr
package to pad the time object (the stringr
package is covered in another tutorial).
library(stringr)
ymd_hm(paste(date, str_pad(hrmin, width=4, pad="0")))
[1] "2018-03-17 07:12:00 UTC"
If you want to extract the day of the week from a date vector, use the wday
function.
wday(x.date)
[1] 1 3 3
If you want the day of the week displayed as its three letter designation, add the label=TRUE
parameter.
wday(x.date, label=TRUE)
[1] Sun Tue Tue
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
You’ll note that the function returns a factor
object with seven levels–one for each day of the week (Sun, Mon, Tue, Wed, Thu, Fri, Sat)–as well as the level hierarchy which will dictate the order in which values will be displayed if grouped by this factor. The levels are not necessarily reflected in the vector elements (only Sun, Tue are present), but the levels are there if we were ever to add addition day elements to this vector.
The following table lists functions that extract different elements of a date object.
Functions | Extracted element |
---|---|
hour() |
Hour of the day |
minute() |
Minute of the hour |
day() |
Day of the month |
yday() |
Day of the year |
decimal_date() |
Decimal year |
month() |
Month of the year |
year() |
Year |
tz() |
Time zone |
You can apply certain operations to dates as you would to numbers. For example, to list the number of days between the first and third elements of the vector x.date
type the following:
3] - x.date[1]) / ddays() (x.date[
[1] 415.5949
To get the number of weeks between both dates:
3] - x.date[1]) / dweeks() (x.date[
[1] 59.37069
Likewise, you can get the number of minutes between dates by dividing by dminutes()
and the number of years by dividing by dyears()
.
You can also apply Boolean operations on dates. For example, to find which date element in x.date
falls between the 11th and 24th day of any month, type:
mday(x.date) > 11) & (mday(x.date) < 24) (
[1] TRUE FALSE TRUE
If you want the command to return just the dates that satisfy this query, pass the Boolean operation as an index to the x.date
vector:
mday(x.date) > 11) & (mday(x.date) < 24) ] x.date[ (
[1] "2013-06-23 03:45:23 AKDT" "2014-08-12 18:01:59 AKDT"
You can create a character vector from a date object. This is useful if you want to annotate plots with dates or include date values in reports. For example, to convert the date object x.date
to a “Month_name Year” character format, type the following:
as.character(x.date, format="%B %Y")
[1] "June 2013" "July 2013" "August 2014"
The format=
parameter accepts many different date/time format codes listed in the following table (note the case!).
Format codes | Description | Example |
---|---|---|
%a |
Abbreviated weekday name | Sun, Tue, Tue |
%A |
Full weekday name | Sunday, Tuesday, Tuesday |
%m |
Month as decimal number | 06, 07, 08 |
%b |
Abbreviated month name | Jun, Jul, Aug |
%B |
Full month name | June, July, August |
%c |
Date and time, locale-specific | Sun Jun 23 03:45:23 2013, Tue Jul 30 14:23:00 2013, Tue Aug 12 18:01:59 2014 |
%d |
Day of the month as decimal number | 23, 30, 12 |
%H |
Hours as decimal number (00 to 23) | 03, 14, 18 |
%I |
Hours as decimal number (01 to 12) | 03, 02, 06 |
%p |
AM/PM indicator in the locale | AM, PM, PM |
%j |
Day of year as decimal number | 174, 211, 224 |
%M |
Minute as decimal number (00 to 59) | 45, 23, 01 |
%S |
Second as decimal number | 23, 00, 59 |
%U |
Week of the year starting on the first Sunday | 25, 30, 32 |
%W |
Week of the year starting on the first Monday | 24, 30, 32 |
%w |
Weekday as decimal number (Sunday = 0) | 0, 2, 2 |
%x |
Date (locale-specific) | 6/23/2013, 7/30/2013, 8/12/2014 |
%X |
Time (locale-specific) | 3:45:23 AM, 2:23:00 PM, 6:01:59 PM |
%Y |
4-digit year | 2013, 2013, 2014 |
%y |
2-digit year | 13, 13, 14 |
%Z |
Abbreviated time zone | AKDT, AKDT, AKDT |
%z |
Time zone | -0800, -0800, -0800 |