[R] duplicated.data.frame() and POSIXct with DST shift

David Winsemius dwinsemius at comcast.net
Fri Dec 14 02:01:56 CET 2012


On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote:

> Hi,
>
> I encountered the behavior, that the duplicated method for  
> data.frames gives "false positives" if there are columns of class  
> POSIXct with a clock shift from DST to standard time.
>
> time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0,  
> 60*60)
> time
> [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
>
> df <- data.frame(time, text="foo")
> duplicated(df)
> [1] FALSE  TRUE

In this instance
>
> This is because the timezone is lost after calling paste():
> do.call(paste, c(df, sep = "\r"))

I suspect the problem arise when 'paste' coerces to character:

 > as.character(time)
[1] "2012-10-28 02:00:00" "2012-10-28 02:00:00"

I think that as.character might get missed since the 'paste' operation  
is done internally.

 > as.character(time, usetz=TRUE)
[1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"


-- 
David.


[1] "2012-10-28 02:00:00\rfoo" "2012-10-28 02:00:00\rfoo"
>
>

> I can't really figure out if this behavior is desired or not. If so,  
> a short warning in ?duplicated could be helpful. It is mentioned how  
> duplicated.data.frame() works, but I didn't find a hint to properly  
> handle POSIXct-objects.

There is no duplicated.POSIXct method
>
> My particular problem was to cast a data.frame like this one with  
> cast() (which calls reshape1(), which calls duplicated()):
>
> df2 <- data.frame(time, time1=as.numeric(time),
>                  lab=rep(1:3, each=2), value=101:106,
>                  text=rep(c("foo", "bar"), each=3))
>
> library(reshape2)
>
> Using the column of class POSIXct as a variable in the formula gives:
> cast(lab*time~text, data=df2, value="value")
> Aggregation requires fun.aggregate: length used as default
>  lab                time bar foo
> 1   1 2012-10-28 02:00:00   0   2
> 2   2 2012-10-28 02:00:00   1   1
> 3   3 2012-10-28 02:00:00   2   0
>
> Converting to numeric, casting and converting back works as  
> expected, although the timezone is not visible, because  
> print.data.frame() calls format.POSIXct() with, usetz = FALSE:
> y <- cast(lab*time1~text, data=df2, value="value")
> y$time1 <- as.POSIXct("1970-01-01 01:00") + as.numeric(y$time1)
>
> Can anyone suggest a more elegant solution?
>
> Best,
> Tobias
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list