[R] Problem with ddply in the plyr-package: surprising output of a date-column

Brian Diggs diggsb at ohsu.edu
Mon Apr 25 23:14:19 CEST 2011


On 4/25/2011 1:07 PM, Hadley Wickham wrote:
>> If you need plyr for other tasks you ought to use a different
>> class for your date data (or wait until plyr can deal with
>> POSIXlt objects).
>
> How do you get POSIXlt objects into a data frame?
>
>> df<- data.frame(x = as.POSIXlt(as.Date(c("2008-01-01"))))
>> str(df)
> 'data.frame':	1 obs. of  1 variable:
>   $ x: POSIXct, format: "2008-01-01"
>
>> df<- data.frame(x = I(as.POSIXlt(as.Date(c("2008-01-01")))))
>> str(df)
> 'data.frame':	1 obs. of  1 variable:
>   $ x: AsIs, format: "0"
>
> Hadley

Assigning to a column after the data.frame creation step

 > df <- data.frame(x = as.POSIXlt(as.Date(c("2008-01-01"))))
 > str(df)
'data.frame':   1 obs. of  1 variable:
  $ x: POSIXct, format: "2008-01-01"
 > dput(df)
structure(list(x = structure(1199145600, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), .Names = "x", row.names = c(NA, -1L
), class = "data.frame")
 > df$x <- as.POSIXlt(as.Date(c("2008-01-01")))
 > str(df)
'data.frame':   1 obs. of  1 variable:
  $ x: POSIXlt, format: "2008-01-01"
 > dput(df)
structure(list(x = structure(list(sec = 0, min = 0L, hour = 0L,
     mday = 1L, mon = 0L, year = 108L, wday = 2L, yday = 0L, isdst = 
0L), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXlt", "POSIXt"), tzone = "UTC")), .Names = "x", 
row.names = c(NA,
-1L), class = "data.frame")

This is reminiscent of the 1d array problem; there are types that are 
coerced into other types when passed as part of a data.frame constructor 
(data.frame call), but are not coerced when assigned to a column.

Looking at help pages, calls to data.frame call as.data.frame on each 
argument; `[<-.data.frame` has a section on coercion which starts "The 
story over when replacement values are coerced is a complicated one, and 
one that has changed during R's development. This section is a guide 
only." which makes me think it is not all that well defined.

Digging more, there is a as.data.frame.POSIXlt, although the help page 
for it (DateTimeClasses in base) does not mention it or document it.  It 
is documented, though, in as.data.frame (which also has comments about 
coercing 1 dimensional arrays).

So, potentially, there could be differences with any class that has an 
as.data.frame method because it will be treated differently if passed to 
data.frame versus a column assignment with `[<-.data.frame`

 > methods("as.data.frame")
  [1] as.data.frame.aovproj*        as.data.frame.array
  [3] as.data.frame.AsIs            as.data.frame.character
  [5] as.data.frame.complex         as.data.frame.data.frame
  [7] as.data.frame.Date            as.data.frame.default
  [9] as.data.frame.difftime        as.data.frame.factor
[11] as.data.frame.ftable*         as.data.frame.function
[13] as.data.frame.idf*            as.data.frame.integer
[15] as.data.frame.list            as.data.frame.logical
[17] as.data.frame.logLik*         as.data.frame.matrix
[19] as.data.frame.model.matrix    as.data.frame.numeric
[21] as.data.frame.numeric_version as.data.frame.ordered
[23] as.data.frame.POSIXct         as.data.frame.POSIXlt
[25] as.data.frame.raw             as.data.frame.table
[27] as.data.frame.ts              as.data.frame.vector

So, I suppose it is working as documented.  Though I wonder how long ago 
it was that someone (who has been using R regularly for at least a year) 
actually read the entire help page for data.frame and/or as.data.frame. 
  It's one of those things you think you know and understand until you 
find out you don't.

-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



More information about the R-help mailing list