[Rd] Wrong length of POSIXt vectors (PR#10507)

Gabor Grothendieck ggrothendieck at gmail.com
Sun Dec 16 01:20:07 CET 2007


If it were simply deprecated and then changed then
everyone using it would get a warning during the period
of deprecation so it would
not be so bad.  Given that its current behavior is
not very useful I suspect its not widely used anyways.
| haven't followed the whole discussion so sorry if these
points have already been made.

On Dec 15, 2007 5:17 PM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
> >>>>> "TP" == Tony Plate <tplate at acm.org>
> >>>>>     on Fri, 14 Dec 2007 13:58:30 -0700 writes:
>
>
>    TP> Duncan Murdoch wrote:
>    >> On 12/13/2007 1:59 PM, Tony Plate wrote:
>    >>> Duncan Murdoch wrote:
>    >>>> On 12/11/2007 6:20 AM, simecek at gmail.com wrote:
>    >>>>> Full_Name: Petr Simecek
>    >>>>> Version: 2.5.1, 2.6.1
>    >>>>> OS: Windows XP
>    >>>>> Submission from: (NULL) (195.113.231.2)
>    >>>>>
>    >>>>>
>    >>>>> Several times I have experienced that a length of a POSIXt vector
>    >>>>> has not been
>    >>>>> computed right.
>    >>>>>
>    >>>>> Example:
>    >>>>>
>    >>>>> tv<-structure(list(sec = c(50, 0, 55, 12, 2, 0, 37, NA, 17, 3, 31
>    >>>>> ), min = c(1L, 10L, 11L, 15L, 16L, 18L, 18L, NA, 20L, 22L, 22L
>    >>>>> ), hour = c(12L, 12L, 12L, 12L, 12L, 12L, 12L, NA, 12L, 12L, 12L),
>    >>>>> mday = c(13L, 13L, 13L, 13L, 13L, 13L, 13L, NA, 13L, 13L, 13L), mon
>    >>>>> = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, NA, 5L, 5L, 5L), year = c(105L,
>    >>>>> 105L, 105L, 105L, 105L, 105L, 105L, NA, 105L, 105L, 105L), wday =
>    >>>>> c(1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 1L, 1L, 1L), yday = c(163L, 163L,
>    >>>>> 163L, 163L, 163L, 163L, 163L, NA, 163L, 163L, 163L), isdst = c(1L,
>    >>>>> 1L, 1L, 1L, 1L, 1L, 1L, -1L, 1L, 1L, 1L)), .Names = c("sec", "min",
>    >>>>> "hour", "mday", "mon", "year", "wday", "yday", "isdst"
>    >>>>> ), class = c("POSIXt", "POSIXlt"))
>    >>>>>
>    >>>>> print(tv)
>    >>>>> # print 11 time points (right)
>    >>>>>
>    >>>>> length(tv)
>    >>>>> # returns 9 (wrong)
>    >>>>
>    >>>> tv is a list of length 9.  The answer is right, your expectation is
>    >>>> wrong.
>    >>>>> I have tried that on several computers with/without switching to
>    >>>>> English
>    >>>>> locales, i.e. Sys.setlocale("LC_TIME", "en"). I have searched a
>    >>>>> help pages but I
>    >>>>> cannot imagine how that could be OK.
>    >>>>
>    >>>> See this in ?POSIXt:
>    >>>>
>    >>>> Class '"POSIXlt"' is a named list of vectors...
>    >>>>
>    >>>> You could define your own length measurement as
>    >>>>
>    >>>> length.POSIXlt <- function(x) length(x$sec)
>    >>>>
>    >>>> and you'll get the answer you expect, but be aware that length.XXX
>    >>>> methods are quite rare, and you may surprise some of your users.
>    >>>>
>    >>>
>    >>> On the other hand, isn't the fact that length() currently always
>    >>> returns 9 for POSIXlt objects likely to be a surprise to many users
>    >>> of POSIXlt?
>    >>>
>    >>> The back of "The New S Language" says "Easy-to-use facilities allow
>    >>> you to organize, store and retrieve all sorts of data. ... S
>    >>> functions and data organization make applications easy to write."
>    >>>
>    >>> Now, POSIXlt has methods for c() and vector subsetting "[" (and many
>    >>> other vector-manipulation methods - see methods(class="POSIXlt")).
>    >>> Hence, from the point of view of intending to supply "easy-to-use
>    >>> facilities ... [for] all sorts of data", isn't it a little
>    >>> incongruous that length() is not also provided -- as 3 functions (any
>    >>> others?) comprise a core set of vector-manipulation functions?
>    >>>
>    >>> Would it make sense to have an informal prescription (e.g., in
>    >>> R-exts) that a class that implements a vector-like object and
>    >>> provides at least of one of functions 'c', '[' and 'length' should
>    >>> provide all three?  It would also be easy to describe a test-suite
>    >>> that should be included in the 'test' directory of a package
>    >>> implementing such a class, that had some tests of the basic
>    >>> vector-manipulation functionality, such as:
>    >>>
>    >>> > # at this point, x0, x1, x3, & x10 should exist, as vectors of the
>    >>> > # class being tested, of length 0, 1, 3, and 10, and they should
>    >>> > # contain no duplicate elements
>    >>> > length(x0)
>    >>> [1] 1
>    >>> > length(c(x0, x1))
>    >>> [1] 2
>    >>> > length(c(x1,x10))
>    >>> [1] 11
>    >>> > all(x3 == x3[seq(len=length(x3))])
>    >>> [1] TRUE
>    >>> > all(x3 == c(x3[1], x3[2], x3[3]))
>    >>> [1] TRUE
>    >>> > length(c(x3[2], x10[5:7]))
>    >>> [1] 4
>    >>> >
>    >>>
>    >>> It would also be possible to describe a larger set of vector
>    >>> manipulation functions that should be implemented together, including
>    >>> e.g., 'rep', 'unique', 'duplicated', '==', 'sort', '[<-', 'is.na',
>    >>> head, tail ... (many of which are provided for POSIXlt).
>    >>>
>    >>> Or is there some good reason that length() cannot be provided (while
>    >>> 'c' and '[' can) for some vector-like classes such as "POSIXlt"?
>    >>
>    >> What you say sounds good in general, but the devil is in the details.
>    >> Changing the meaning of length(x) for some objects has fairly
>    >> widespread effects.  Are they all positive?  I don't know.
>    >>
>    >> Adding a prescription like the one you suggest would be good if it's
>    >> easy to implement, but bad if it's already widely violated.  How many
>    >> base or CRAN or Bioconductor packages violate it currently?   Do the
>    >> ones that provide all 3 methods do so in a consistent way, i.e. does
>    >> "length(x)" mean the same thing in all of them?
>    TP> I'm not sure doing something like this would be so bad even if it is
>    TP> already widely violated.  R has evolved significantly over time, and
>    TP> many rough edges have been cleaned up, sometimes in ways that were not
>    TP> backward compatible.  This is a great thing & my thanks go to the people
>    TP> working on R.
>
>    TP> If some base or CRAN or Bioconductor packages currently don't implement
>    TP> vector operations consistently, wouldn't it be good to know that?
>    TP> Wouldn't it be useful to have an automatic way of determining whether a
>    TP> particular vector-like class is consistent with generally agreed set of
>    TP> principles for how basic vector operations should work -- things like
>    TP> length(x)+length(y)==length(c(x,y))?  This could help developers check,
>    TP> document & improve their code, and it could help users understand how to
>    TP> use a class, and to evaluate the software quality of a class
>    TP> implementation and whether or not it provides the functionality they need.
>    >> I agree that the current state is less than perfect, but making it
>    >> better would really be a lot of work.  I suspect there are better ways
>    >> to spend my time, so I'm not going to volunteer to do it.  I'm not
>    >> even going to invite someone else to do it, or offer to review your
>    >> work if you volunteer.  I think this falls into the class of "next
>    >> time we write a language, let's handle this better" problems.
>
>    TP> Thanks very much for the thoughtful (and honest) feedback!  I suspect
>    TP> that the current state could be improved with just a little work, and
>    TP> without forcing anyone to do any work they don't want to do.  I'll think
>    TP> about this more and try to come back with a better & more concrete
>    TP> suggestion.
>
> Good. From "the outside" (i.e. superficial gut feeling :-)
> I've sympathized with your suggestion, Tony, quite a bit.
> Further, my own taste would probably also have lead me to define
> length.POSIXlt differently ..
> OTOH, I agree with Duncan that it may be too late to change it
> and even more to enforce the consistency rules you propose.
> If with a small bit of code (and some patience) we could check
> all of CRAN and hopefully bioconductor packages and find only a
> very few where it was violated, the whole endeavor may be worth it
> ... for the sake of making  R more consistent, easier to teach, etc..
>
> Unfortunately I don't remember now what happened many months ago
> when I indeed did experiment with having something like
>
>  length.POSIXlt <- function(x) length(x$sec)
>
> Martin Maechler
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list