[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue May 26 15:24:03 CEST 2020

```>>>>> Hervé Pagès
>>>>>     on Sun, 24 May 2020 14:22:37 -0700 writes:

> On 5/24/20 00:26, Gabriel Becker wrote:
>>
>>
>> On Sat, May 23, 2020 at 9:59 PM Hervé Pagès <hpages using fredhutch.org
>> <mailto:hpages using fredhutch.org>> wrote:
>>
>> On 5/23/20 17:45, Gabriel Becker wrote:
>> > Maybe my intuition is just
>> > different but when I collapse multiple character vectors together, I
>> > expect all the characters from each of those vectors to be in the
>> > resulting collapsed one.
>>
>> Yes I'd expect that too. But the **collapse** operation in paste() has
>> never been about collapsing **multiple** character vectors together.
>> What it does is collapse the **single** character vector that comes out
>> of the 'sep' operation.
>>
>>
>> I understand what it does, I broke ti down the same way in my post
>> earlier in the thread. the fact remains is that it is a single function
>> which significantly muddies the waters. so you can say
>>
>> paste0(x,y, collapse=",", recycle0=TRUE)
>>
>> is not a collapse operation on multiple vectors, and of course there's a
>> sense in which you're not wrong (again I understand what these functions
>> do), but it sure looks like one in the invocation, doesn't it?
>>
>> Honestly the thing that this whole discussion has shown me most clearly
>> is that, imho, collapse (accepting ONLY one data vector) and
>> paste(accepting multiple) should never have been a single function to
>> begin with.  But that ship sailed long long ago.

> Yes :-(

>>
>> So
>>
>>    paste(x, y, z, sep="", collapse=",")
>>
>> is analogous to
>>
>>    sum(x + y + z)
>>
>>
>> Honestly, I'd be significantly more comfortable if
>>
>> 1:10 + integer(0) + 5
>>
>> were an error too.

> This is actually the recycling scheme used by mapply():

>> mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5)
> Error in mapply(FUN = FUN, ...) :
> zero-length inputs cannot be mixed with those of non-zero length

> AFAIK base R uses 3 different recycling schemes for n-ary operations:

> (1) The recycling scheme used by arithmetic and comparison operations
> (Arith, Compare, Logic group generics).

> (2) The recycling scheme used by classic paste().

> (3) The recycling scheme used by mapply().

> Having such a core mechanism like recycling being inconsistent across
> base R is sad. It makes it really hard to predict how a given n-ary
> function will recycle its arguments unless you spend some time trying it
> yourself with several combinations of vector lengths. It is of course
> the source of numerous latent bugs. I wish there was only one but that's
> just a dream.

> None of these 3 recycling schemes is perfect. IMO (2) is by far the
> worst. (3) is too restrictive and would need to be refined if we wanted
> to make it a good universal recycling scheme.

> Anyway I don't think it makes sense to introduce a 4th recycling scheme
> at this point even though it would be a nice item to put on the wish
> list for R 7.0.0 with the ultimate goal that it will universally adopted
> in R 11.0.0 ;-)

> So if we have to do with what we have IMO (1) is the scheme that makes
> most sense although I agree that it can do some surprising things for
> some unusual combinations of vector lengths. It's the scheme I adhere to
> in my own binary operations e.g. in S4Vector::pcompare().

> The modest proposal of the 'recycle0' argument is only to let the user
> switch from recycling scheme (2) to (1) if they're not happy with scheme
> (2) (I'm one of them).

Yes, indeed.  This was the purpose of introducing  'recycle0'.

Now, with collapse = <string>,  {in R "string" := character vector of length 1}.
we clearly see different interpretations on what is desirable
for  recycle0 = TRUE,
all of you (Suharto, Bill, Hervé, Gabe) assert that the behavior
should be different than now, and should either error (possibly,
by Gabe), or return a single string  (possibly with a warning),
i.e., collapse = <string>  behavior should not be influenced (or
possibly be conflicting with) by recycle0=TRUE.

Within R core, some believe the current recyle0=TRUE behavior to
be the correct one.  Personally, I see
reasons for both..

What about remaining back-compatible, not only to R 3.y.z with
default recycle0=FALSE, but also to R 4.0.0 with recycle0=TRUE
*and* add a new option for the Suharto-Bill-Hervé-Gabe behavior,
e.g., recycle0="sep.only" or just  recycle0="sep" ?

As (for back-compatibility reasons) you have to specify
'recycle0 = ..'  anyway, you would get what makes most sense to
you by using such a third option.

? (WDYT ?)

Martin

> Switching to scheme (3) or to a new custom scheme
> would be a completely different proposal.

>>
>> At least I'm consistent right?

> Yes :-)

> Anyway discussing recycling schemes is interesting but not directly
> related with what the OP brought up (behavior of the 'collapse' operation).

> Cheers,
> H.

>>
>> ~G

```