[R] dates() is a great date function in R

JB Kim Jung.Bae.Kim at morganstanley.com
Thu Jul 19 16:33:11 CEST 2007


This is great.

I haven't seen "as.Date" function before, and was using "as.date" from library(date).
(note the lowercase 'd')

I have an alternative which might or might not be faster...
If the date is formatted "yyyymmdd" (e.g. 20070719)

	library(date)
  formatted <- gsub("^(\\d{4})(\\d{2})(\\d{2})$", "\\2-\\3-\\1", d$yyyymmdd, perl=TRUE)
  d$dates <- as.date(formatted)

Since as.date only accepts certain type of date formats, I had to use gsub to
reshuffle the date substrings around.

as.Date returns the objects of class "Date", whereas as.date returns the objects of class "date".
Not sure what the differences are, but a simple test below shows that as.date conversion is
slightly faster, given a character vector of 10000 date entries.



FYI,

I ran a quick performance comparison test on a 64bit linux machine on 2.6.9 kernel.
The test is very rudimentary, but hopefully useful...

I have two scripts:


######################################  as.date_test.R  ##############################################

library(date)

d <- read.table("/tmp/dates", as.is = TRUE, col.names = c("yyyymmdd"),  colClasses = c("character"))
formatted <- gsub("^(\\d{4})(\\d{2})(\\d{2})$", "\\2-\\3-\\1", d$yyyymmdd, perl=TRUE)
d$dates <- as.date(formatted)

print(nrow(d))
print(d$dates[1:3])

######################################################################################################


######################################  as.Date_test.R  ##############################################

d <- read.table("/tmp/dates", as.is = TRUE, col.names = c("yyyymmdd"), colClasses = c("character"))

d$dates <- as.Date(d$yyyymmdd, format = "%Y%m%d")

print(nrow(d))
print(d$dates[1:3])

######################################################################################################


Both scripts read in the same text file containing 10000 date strings, and then convert them into 
appropriate date objects.


# 10000 date records in a flat file
<jbkim at mymachine>$ wc -l /tmp/dates
   10000 /tmp/dates

# just to illustrate what the dates look like
<jbkim at mymachine>$ head -2 /tmp/dates
19900817
19900820


# Running the test script 5 times each

<jbkim at mymachine>$ for i in 1 2 3 4 5; do time R --vanilla < as.date_test.R > /dev/null; done

real    0m1.29s
user    0m1.23s
sys     0m0.05s

real    0m1.28s
user    0m1.23s
sys     0m0.06s

real    0m1.28s
user    0m1.22s
sys     0m0.06s

real    0m1.29s
user    0m1.22s
sys     0m0.06s

real    0m1.28s
user    0m1.21s
sys     0m0.07s

<jbkim at mymachine>$ for i in 1 2 3 4 5; do time R --vanilla < as.Date_test.R > /dev/null; done

real    0m1.65s
user    0m0.99s
sys     0m0.64s

real    0m1.64s
user    0m0.98s
sys     0m0.66s

real    0m1.63s
user    0m0.98s
sys     0m0.65s

real    0m1.64s
user    0m1.00s
sys     0m0.64s

real    0m1.64s
user    0m0.98s
sys     0m0.65s



Notice that as.date conversion is silghtly faster than as.Date conversion, on average...

Just thought it was interesting to share.
(and thanks Mark Leeds for reference)

Regards,
JB

On 07/18/07 16:13:49, Gavin Simpson wrote:
> On Wed, 2007-07-18 at 12:14 -0700, Mr Natural wrote: 
> > Proper calendar dates in R are great for plotting and calculating. 
> > However for the non-wonks among us, they can be very frustrating.
> > I have recently discussed the pains that people in my lab have had 
> > with dates in R. Especially the frustration of bringing date data into R 
> > from Excel, which we have to do a lot. 
> 
> I've always found the following reasonably intuitive:
> 
> Given the csv file that I've pasted in below, the following reads the
> csv file in, formats the dates and class Date and then draws a plot.
> 
> I have dates in DD/MM/YYYY format so year is not first - thus attesting
> to R not hating dates in this format ;-)
> 
> ## read in csv data
> ## as.is = TRUE stops characters being converted to factors
> ## thus saving us an extra step to convert them back
> dat <- read.csv("date_data.csv", as.is = TRUE)
> 
> ## we convert to class Date
> ## format tells R how the dates are formatted in our character strings
> ## see ?strftime for the meaning and available codes
> dat$Date <- as.Date(dat$Date, format = "%d/%m/%Y")
> 
> ## check this worked ok
> str(dat$Date)
> dat$Date
> 
> ## see nicely formatted dates and not a drop of R-related hatred 
> ## but just about the most boring graph I could come up with
> plot(Data ~ Date, dat, type = "l")
> 
> And you can keep your Excel file formatted as dates as well - bonus!
> 
> Oh, and before you get "Martin'd", it is the chron *package*!
> 
> HTH
> 
> G
> 
> CSV file I used, generated in OpenOffice.org, but I presume it stores
> Dates in the same way as Excel?:
> 
> "Data","Date"
> 1,01/01/2007
> 2,02/01/2007
> 3,03/01/2007
> 4,04/01/2007
> 5,05/01/2007
> 6,06/01/2007
> 7,07/01/2007
> 8,08/01/2007
> 9,09/01/2007
> 10,10/01/2007
> 11,11/01/2007
> 10,12/01/2007
> 9,13/01/2007
> 8,14/01/2007
> 7,15/01/2007
> 6,16/01/2007
> 5,17/01/2007
> 4,18/01/2007
> 3,19/01/2007
> 2,20/01/2007
> 1,21/01/2007
> 1,22/01/2007
> 2,23/01/2007
> 3,24/01/2007
> 
> > Please find below a simple analgesic for R date importation that I
> > discovered 
> > over the last 1.5 days (Learning new stuff in R is calculated in 1/2 days).
> > 
> > The function    dates()    gives the simplest way to get calendar dates into
> > R from Excel that I can find.
> > But straight importation of Excel dates, via a csv or txt file, can be a a
> > huge pain (I'll give details for anyone who cares to know). 
> > 
> > My pain killer is:
> > Consider that you have Excel columns in month, day, year format. Note that R
> > hates date data that does not lead with the year. 
> > 
> > a. Load the chron library by typing   library(chron)   in the console.
> > You know that you need this library from information revealed by 
> > performing the query,
> > ?dates()"    in the Console window. This gives the R documentation 
> > help file for this and related time, date functions.  In the upper left 
> > of the documentation, one sees "dates(chron)". This tells you that you
> > need the library chron. 
> > 
> > b. Change the format "dates" in Excel to format "general", which gives 
> > 5 digit Julian dates. Import the csv file (I use    read.csv()  with the 
> > Julian dates and other data of interest.
> > 
> > c.  Now, change the Julian dates that came in with the csv file into 
> > calendar dates with the    dates() function. Below is my code for performing 
> > this activity, concerning an R data file called ss,
> > 
> > ss holds the Julian dates, illustrated below from the column MPdate,
> > 
> > >ss$MPdate[1:5]
> > [1] 34252 34425 34547 34759 34773
> > 
> > The dates() function makes calendar dates from Julian dates,
> > 
> > >dmp<-dates(ss$MPdate,origin=c(month = 1, day = 1, year = 1900))
> > 
> > > dmp[1:5]
> > [1] 10/12/93 04/03/94 08/03/94 03/03/95 03/17/95
> > 
> > I would appreciate the comments of more sophisticated programmers who
> > can suggest streamlining or shortcutting this operation.
> > 
> > regards, Don
> > 
> > 
> > 
> >  
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Gavin Simpson                 [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
JB Kim
Morgan Stanley, METL
1585 - 9th Floor
New York, NY 10036



More information about the R-help mailing list