[Rd] Wrong length of POSIXt vectors (PR#10507)
murdoch at stats.uwo.ca
Thu Dec 13 20:39:00 CET 2007
On 12/13/2007 1:59 PM, Tony Plate wrote:
> Duncan Murdoch wrote:
>> On 12/11/2007 6:20 AM, simecek at gmail.com wrote:
>>> Full_Name: Petr Simecek
>>> Version: 2.5.1, 2.6.1
>>> OS: Windows XP
>>> Submission from: (NULL) (188.8.131.52)
>>> Several times I have experienced that a length of a POSIXt vector has not been
>>> computed right.
>>> tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
>>> ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
>>> ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L,
>>> 12L), mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L,
>>> 13L), mon = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L,
>>> 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday = c(1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L,
>>> 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L,
>>> 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec",
>>> "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
>>> ), class = c("POSIXt", "POSIXlt"))
>>> # print 11 time points (right)
>>> # returns 9 (wrong)
>> tv is a list of length 9. The answer is right, your expectation is wrong.
>>> I have tried that on several computers with/without switching to English
>>> locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a help pages but I
>>> cannot imagine how that could be OK.
>> See this in ?POSIXt:
>> Class '"POSIXlt"' is a named list of vectors...
>> You could define your own length measurement as
>> length.POSIXlt <- function(x) length(x$sec)
>> and you'll get the answer you expect, but be aware that length.XXX
>> methods are quite rare, and you may surprise some of your users.
> On the other hand, isn't the fact that length() currently always returns 9
> for POSIXlt objects likely to be a surprise to many users of POSIXlt?
> The back of "The New S Language" says "Easy-to-use facilities allow you to
> organize, store and retrieve all sorts of data. ... S functions and data
> organization make applications easy to write."
> Now, POSIXlt has methods for c() and vector subsetting "[" (and many other
> vector-manipulation methods - see methods(class="POSIXlt")). Hence, from
> the point of view of intending to supply "easy-to-use facilities ... [for]
> all sorts of data", isn't it a little incongruous that length() is not also
> provided -- as 3 functions (any others?) comprise a core set of
> vector-manipulation functions?
> Would it make sense to have an informal prescription (e.g., in R-exts) that
> a class that implements a vector-like object and provides at least of one
> of functions 'c', '[' and 'length' should provide all three? It would also
> be easy to describe a test-suite that should be included in the 'test'
> directory of a package implementing such a class, that had some tests of
> the basic vector-manipulation functionality, such as:
> > # at this point, x0, x1, x3, & x10 should exist, as vectors of the
> > # class being tested, of length 0, 1, 3, and 10, and they should
> > # contain no duplicate elements
> > length(x0)
>  1
> > length(c(x0, x1))
>  2
> > length(c(x1,x10))
>  11
> > all(x3 == x3[seq(len=length(x3))])
>  TRUE
> > all(x3 == c(x3, x3, x3))
>  TRUE
> > length(c(x3, x10[5:7]))
>  4
> It would also be possible to describe a larger set of vector manipulation
> functions that should be implemented together, including e.g., 'rep',
> 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na', head, tail ... (many
> of which are provided for POSIXlt).
> Or is there some good reason that length() cannot be provided (while 'c'
> and '[' can) for some vector-like classes such as "POSIXlt"?
What you say sounds good in general, but the devil is in the details.
Changing the meaning of length(x) for some objects has fairly widespread
effects. Are they all positive? I don't know.
Adding a prescription like the one you suggest would be good if it's
easy to implement, but bad if it's already widely violated. How many
base or CRAN or Bioconductor packages violate it currently? Do the
ones that provide all 3 methods do so in a consistent way, i.e. does
"length(x)" mean the same thing in all of them?
I agree that the current state is less than perfect, but making it
better would really be a lot of work. I suspect there are better ways
to spend my time, so I'm not going to volunteer to do it. I'm not even
going to invite someone else to do it, or offer to review your work if
you volunteer. I think this falls into the class of "next time we write
a language, let's handle this better" problems.
More information about the R-devel