[R] duplicated.data.frame() and POSIXct with DST shift

David Winsemius dwinsemius at comcast.net
Fri Dec 14 04:07:42 CET 2012


On Dec 13, 2012, at 5:01 PM, David Winsemius wrote:

> 
> On Dec 13, 2012, at 1:43 PM, Tobias Gauster wrote:
> 
>> Hi,
>> 
>> I encountered the behavior, that the duplicated method for data.frames gives "false positives" if there are columns of class POSIXct with a clock shift from DST to standard time.
>> 
>> time <- as.POSIXct("2012-10-28 02:00", tz="Europe/Vienna") + c(0, 60*60)
>> time
>> [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"
>> 
>> df <- data.frame(time, text="foo")
>> duplicated(df)
>> [1] FALSE  TRUE
> 
> In this instance
>> 
>> This is because the timezone is lost after calling paste():
>> do.call(paste, c(df, sep = "\r"))
> 
> I suspect the problem arise when 'paste' coerces to character:
> 
> > as.character(time)
> [1] "2012-10-28 02:00:00" "2012-10-28 02:00:00"
> 
> I think that as.character might get missed since the 'paste' operation is done internally.
> 
> > as.character(time, usetz=TRUE)
> [1] "2012-10-28 02:00:00 CEST" "2012-10-28 02:00:00 CET"

This would work as intended if you pre-processed the argument to duplicated with:

> data.frame(lapply(df, as.character, usetz=TRUE) )
                      time text
1 2012-10-28 02:00:00 CEST  foo
2  2012-10-28 02:00:00 CET  foo

>  duplicated( data.frame(lapply(df, as.character, usetz=TRUE) ) ) 
[1] FALSE FALSE

> 

> 
> -- 
> David.
> 
> 
> [1] "2012-10-28 02:00:00\rfoo" "2012-10-28 02:00:00\rfoo"
>> 
>> 
> 
>> I can't really figure out if this behavior is desired or not. If so, a short warning in ?duplicated could be helpful. It is mentioned how duplicated.data.frame() works, but I didn't find a hint to properly handle POSIXct-objects.
> 
> There is no duplicated.POSIXct method
>> 
>> My particular problem was to cast a data.frame like this one with cast() (which calls reshape1(), which calls duplicated()):
>> 
>> df2 <- data.frame(time, time1=as.numeric(time),
>>                 lab=rep(1:3, each=2), value=101:106,
>>                 text=rep(c("foo", "bar"), each=3))
>> 
>> library(reshape2)
>> 
>> Using the column of class POSIXct as a variable in the formula gives:
>> cast(lab*time~text, data=df2, value="value")
>> Aggregation requires fun.aggregate: length used as default
>> lab                time bar foo
>> 1   1 2012-10-28 02:00:00   0   2
>> 2   2 2012-10-28 02:00:00   1   1
>> 3   3 2012-10-28 02:00:00   2   0
>> 
>> Converting to numeric, casting and converting back works as expected, although the timezone is not visible, because print.data.frame() calls format.POSIXct() with, usetz = FALSE:
>> y <- cast(lab*time1~text, data=df2, value="value")
>> y$time1 <- as.POSIXct("1970-01-01 01:00") + as.numeric(y$time1)
>> 
>> Can anyone suggest a more elegant solution?
>> 
>> Best,
>> Tobias
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA




More information about the R-help mailing list