[Rd] Surprising length() of POSIXlt vector (PR#14073)

Tony Plate tplate at acm.org
Sun Nov 22 17:21:33 CET 2009


maechler at stat.math.ethz.ch wrote:
>>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>>     on Fri, 20 Nov 2009 09:54:34 +0100 writes:
>>>>>>             
>
>     PD> mark at celos.net wrote:
>     >> Arrays of POSIXlt dates always return a length of 9.  This
>     >> is correct (they're really lists of vectors of seconds,
>     >> hours, and so forth), but other methods disguise them as
>     >> flat vectors, giving superficially surprising behaviour:
>     >> 
>     >> strings <- paste('2009-1-', 1:31, sep='')
>     >> dates <- strptime(strings, format="%Y-%m-%d")
>     >> 
>     >> print(dates)
>     >> #  [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05"
>     >> #  [6] "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10"
>     >> # [11] "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" "2009-01-15"
>     >> # [16] "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20"
>     >> # [21] "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25"
>     >> # [26] "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30"
>     >> # [31] "2009-01-31"
>     >> 
>     >> print(length(dates))
>     >> # [1] 9
>     >> 
>     >> str(dates)
>     >> # POSIXlt[1:9], format: "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" ...
>     >> 
>     >> print(dates[20])
>     >> # [1] "2009-01-20"
>     >> 
>     >> print(length(dates[20]))
>     >> # [1] 9
>     >> 
>     >> I've since realised that POSIXct makes date vectors easier,
>     >> but could we also have something like:
>     >> 
>     >> length.POSIXlt <- function(x) { length(x$sec) }
>     >> 
>     >> in datetime.R, to avoid breaking functions (like the
>     >> str.POSIXt method) which use length() in this way?
>
>
>     PD> [You need "wishlist" in the title for this sort of stuff.]
>
>     PD> I'd be wary of this. Just the other day we found that identical() broke 
>     PD> on some objects because a package had length() redefined as a class 
>     PD> method. I.e. the danger is that something wants to use length() with its 
>     PD> original low-level interpretation.
>
> Yes, of course.
> and Romain mentioned  str().  Note that we have needed to define
> a "POSIXt" method for str(), partly just *because* of the
> current anomaly:
> As Tony Plate, e.g., has argued, entirely correctly in my view,
> the anomaly is that    length() and "["   are not compatible;
> and while I think no R language definition says that they should
> be, I still believe that you need very good reasons for them to
> be incompatible, as they are for POSIXlt.
>
> In the current case, for me the only good reason is backwards
> compatibility.
> My personal taste would be to change it and see what happens.
> I would be willing to clean up after that change within R 'base'
> and all packages I am coauthoring (quite a few), but of course
> there are still a thousand more R packages..
> My strong bet would be that less than 1% would be affected,
> and my point guess for the percentage affected would be
> rather in the order of  1/1000.
>
> The question is if we (you too!), the R community, are willing to
> bear the load of cleanup, after such a change which would really
> *improve* consistency of that small corner of R.
> For me, as I indicated above, I am willing to bear my share
> (and actually have got it ready for R-devel)
>   
Would be great to see this change!  Surely the right way to do things is 
that functions that wish to examine the low level structure of S3 
objects should use unclass() before looking at length and elements, so 
there's no reason for a class such as POSIXlt to not provide a 
logical-level length method.

At a broader level, when I've designed vector/array classes, I've 
wondered what methods I should define, but have been unable to find any 
specification of a set of methods.  When one thinks about it, there are 
actually quite a set of strongly-connected methods with quite a lot a 
behaviors to implement, e.g., length, '[' (with logical, numeric & 
character indicies, including 0 and NA possibilities), '[[', 'c', and 
then optionally 'names', and then for multi-dim objects, 'dim', 
'dimnames', etc.  Consequently, last time this discussion on length and 
'[' methods POSIXlt came up, I wrote a function that automatically 
tested behavior of all these methods on a specified class and summarizes 
the behavior.  If anyone is interested in such a thing, I'd be happy to 
dig it up and distribute it (I'd attach it to this message, but I'm on 
vacation and don't have access to the compute that I think it's on.)

-- Tony Plate

> Martin Maechler, ETH Zurich (and R Core Team)
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list