[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue May 26 15:24:03 CEST 2020


>>>>> Hervé Pagès 
>>>>>     on Sun, 24 May 2020 14:22:37 -0700 writes:

    > On 5/24/20 00:26, Gabriel Becker wrote:
    >> 
    >> 
    >> On Sat, May 23, 2020 at 9:59 PM Hervé Pagès <hpages using fredhutch.org 
    >> <mailto:hpages using fredhutch.org>> wrote:
    >> 
    >> On 5/23/20 17:45, Gabriel Becker wrote:
    >> > Maybe my intuition is just
    >> > different but when I collapse multiple character vectors together, I
    >> > expect all the characters from each of those vectors to be in the
    >> > resulting collapsed one.
    >> 
    >> Yes I'd expect that too. But the **collapse** operation in paste() has
    >> never been about collapsing **multiple** character vectors together.
    >> What it does is collapse the **single** character vector that comes out
    >> of the 'sep' operation.
    >> 
    >> 
    >> I understand what it does, I broke ti down the same way in my post 
    >> earlier in the thread. the fact remains is that it is a single function 
    >> which significantly muddies the waters. so you can say
    >> 
    >> paste0(x,y, collapse=",", recycle0=TRUE)
    >> 
    >> is not a collapse operation on multiple vectors, and of course there's a 
    >> sense in which you're not wrong (again I understand what these functions 
    >> do), but it sure looks like one in the invocation, doesn't it?
    >> 
    >> Honestly the thing that this whole discussion has shown me most clearly 
    >> is that, imho, collapse (accepting ONLY one data vector) and 
    >> paste(accepting multiple) should never have been a single function to 
    >> begin with.  But that ship sailed long long ago.

    > Yes :-(

    >> 
    >> So
    >> 
    >>    paste(x, y, z, sep="", collapse=",")
    >> 
    >> is analogous to
    >> 
    >>    sum(x + y + z)
    >> 
    >> 
    >> Honestly, I'd be significantly more comfortable if
    >> 
    >> 1:10 + integer(0) + 5
    >> 
    >> were an error too.

    > This is actually the recycling scheme used by mapply():

    >> mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5)
    > Error in mapply(FUN = FUN, ...) :
    > zero-length inputs cannot be mixed with those of non-zero length

    > AFAIK base R uses 3 different recycling schemes for n-ary operations:

    > (1) The recycling scheme used by arithmetic and comparison operations
    > (Arith, Compare, Logic group generics).

    > (2) The recycling scheme used by classic paste().

    > (3) The recycling scheme used by mapply().

    > Having such a core mechanism like recycling being inconsistent across 
    > base R is sad. It makes it really hard to predict how a given n-ary 
    > function will recycle its arguments unless you spend some time trying it 
    > yourself with several combinations of vector lengths. It is of course 
    > the source of numerous latent bugs. I wish there was only one but that's 
    > just a dream.

    > None of these 3 recycling schemes is perfect. IMO (2) is by far the 
    > worst. (3) is too restrictive and would need to be refined if we wanted 
    > to make it a good universal recycling scheme.

    > Anyway I don't think it makes sense to introduce a 4th recycling scheme 
    > at this point even though it would be a nice item to put on the wish 
    > list for R 7.0.0 with the ultimate goal that it will universally adopted 
    > in R 11.0.0 ;-)

    > So if we have to do with what we have IMO (1) is the scheme that makes 
    > most sense although I agree that it can do some surprising things for 
    > some unusual combinations of vector lengths. It's the scheme I adhere to 
    > in my own binary operations e.g. in S4Vector::pcompare().

    > The modest proposal of the 'recycle0' argument is only to let the user 
    > switch from recycling scheme (2) to (1) if they're not happy with scheme 
    > (2) (I'm one of them).

Yes, indeed.  This was the purpose of introducing  'recycle0'.

Now, with collapse = <string>,  {in R "string" := character vector of length 1}.
we clearly see different interpretations on what is desirable
for  recycle0 = TRUE,
all of you (Suharto, Bill, Hervé, Gabe) assert that the behavior
should be different than now, and should either error (possibly,
by Gabe), or return a single string  (possibly with a warning),
i.e., collapse = <string>  behavior should not be influenced (or
possibly be conflicting with) by recycle0=TRUE.

Within R core, some believe the current recyle0=TRUE behavior to
be the correct one.  Personally, I see
reasons for both..

What about remaining back-compatible, not only to R 3.y.z with
default recycle0=FALSE, but also to R 4.0.0 with recycle0=TRUE
*and* add a new option for the Suharto-Bill-Hervé-Gabe behavior,
e.g., recycle0="sep.only" or just  recycle0="sep" ?

As (for back-compatibility reasons) you have to specify
'recycle0 = ..'  anyway, you would get what makes most sense to
you by using such a third option.

? (WDYT ?)

Martin

    > Switching to scheme (3) or to a new custom scheme 
    > would be a completely different proposal.

    >> 
    >> At least I'm consistent right?

    > Yes :-)

    > Anyway discussing recycling schemes is interesting but not directly 
    > related with what the OP brought up (behavior of the 'collapse' operation).

    > Cheers,
    > H.

    >> 
    >> ~G



More information about the R-devel mailing list