[Rd] Surprising length() of POSIXlt vector (PR#14073)

Benilton Carvalho bcarvalh at jhsph.edu
Mon Nov 30 14:41:41 CET 2009


Thank you Martin, for putting this together. Cheers, b
On Nov 30, 2009, at 11:10 AM, maechler at stat.math.ethz.ch wrote:

>>>>>> Tony Plate <tplate at acm.org>
>>>>>>    on Sun, 22 Nov 2009 10:21:33 -0600 writes:
> 
>> maechler at stat.math.ethz.ch wrote:
>>>>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>>>> on Fri, 20 Nov 2009 09:54:34 +0100 writes:
>>>>>>>> 
>>> 
>    PD> mark at celos.net wrote:
>>>>> Arrays of POSIXlt dates always return a length of 9.  This
>>>>> is correct (they're really lists of vectors of seconds,
>>>>> hours, and so forth), but other methods disguise them as
>>>>> flat vectors, giving superficially surprising behaviour:
>>>>> 
>>>>> strings <- paste('2009-1-', 1:31, sep='')
>>>>> dates <- strptime(strings, format="%Y-%m-%d")
>>>>> 
>>>>> print(dates)
>>>>> #  [1] "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" "2009-01-05"
>>>>> #  [6] "2009-01-06" "2009-01-07" "2009-01-08" "2009-01-09" "2009-01-10"
>>>>> # [11] "2009-01-11" "2009-01-12" "2009-01-13" "2009-01-14" "2009-01-15"
>>>>> # [16] "2009-01-16" "2009-01-17" "2009-01-18" "2009-01-19" "2009-01-20"
>>>>> # [21] "2009-01-21" "2009-01-22" "2009-01-23" "2009-01-24" "2009-01-25"
>>>>> # [26] "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29" "2009-01-30"
>>>>> # [31] "2009-01-31"
>>>>> 
>>>>> print(length(dates))
>>>>> # [1] 9
>>>>> 
>>>>> str(dates)
>>>>> # POSIXlt[1:9], format: "2009-01-01" "2009-01-02" "2009-01-03" "2009-01-04" ...
>>>>> 
>>>>> print(dates[20])
>>>>> # [1] "2009-01-20"
>>>>> 
>>>>> print(length(dates[20]))
>>>>> # [1] 9
>>>>> 
>>>>> I've since realised that POSIXct makes date vectors easier,
>>>>> but could we also have something like:
>>>>> 
>>>>> length.POSIXlt <- function(x) { length(x$sec) }
>>>>> 
>>>>> in datetime.R, to avoid breaking functions (like the
>>>>> str.POSIXt method) which use length() in this way?
>>> 
>>> 
>    PD> [You need "wishlist" in the title for this sort of stuff.]
>>> 
>    PD> I'd be wary of this. Just the other day we found that identical() broke
>    PD> on some objects because a package had length() redefined as a class
>    PD> method. I.e. the danger is that something wants to use length() with its
>    PD> original low-level interpretation.
>>> 
>>> Yes, of course.
>>> and Romain mentioned  str().  Note that we have needed to define
>>> a "POSIXt" method for str(), partly just *because* of the
>>> current anomaly:
>>> As Tony Plate, e.g., has argued, entirely correctly in my view,
>>> the anomaly is that    length() and "["   are not compatible;
>>> and while I think no R language definition says that they should
>>> be, I still believe that you need very good reasons for them to
>>> be incompatible, as they are for POSIXlt.
>>> 
>>> In the current case, for me the only good reason is backwards
>>> compatibility.
>>> My personal taste would be to change it and see what happens.
>>> I would be willing to clean up after that change within R 'base'
>>> and all packages I am coauthoring (quite a few), but of course
>>> there are still a thousand more R packages..
>>> My strong bet would be that less than 1% would be affected,
>>> and my point guess for the percentage affected would be
>>> rather in the order of  1/1000.
>>> 
>>> The question is if we (you too!), the R community, are willing to
>>> bear the load of cleanup, after such a change which would really
>>> *improve* consistency of that small corner of R.
>>> For me, as I indicated above, I am willing to bear my share
>>> (and actually have got it ready for R-devel)
> 
>> Would be great to see this change!  Surely the right way to do things is
>> that functions that wish to examine the low level structure of S3
>> objects should use unclass() before looking at length and elements, so
>> there's no reason for a class such as POSIXlt to not provide a
>> logical-level length method.
> 
> I have now committed such a change to R-devel (only!), revision 50616.
> Thank you and Gabor and others for supporting this.
> 
> As said here earlier in this thread:  We must be ready to see
> that this change can break other code that implicitly assumed
> the "old" i.e.  pre R-devel (2.11.x) behavior.
> 
> As I also said earlier, I'm prepared to help package authors to
> fix their code accordingly,
> but I'd be grateful to be notified *if* problems surface from
> this.
> 
> Martin Maechler, ETH Zurich
> 
> 
>> At a broader level, when I've designed vector/array classes, I've
>> wondered what methods I should define, but have been unable to find any
>> specification of a set of methods.  When one thinks about it, there are
>> actually quite a set of strongly-connected methods with quite a lot a
>> behaviors to implement, e.g., length, '[' (with logical, numeric &
>> character indicies, including 0 and NA possibilities), '[[', 'c', and
>> then optionally 'names', and then for multi-dim objects, 'dim',
>> 'dimnames', etc.  Consequently, last time this discussion on length and
>> '[' methods POSIXlt came up, I wrote a function that automatically
>> tested behavior of all these methods on a specified class and summarizes
>> the behavior.  If anyone is interested in such a thing, I'd be happy to
>> dig it up and distribute it (I'd attach it to this message, but I'm on
>> vacation and don't have access to the compute that I think it's on.)
> 
>> -- Tony Plate
> 
>>> Martin Maechler, ETH Zurich (and R Core Team)
>>> 
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> 
> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list