[Rd] paste(character(0), collapse="", recycle0=FALSE) should be ""
hp@ge@ @end|ng |rom |redhutch@org
Sun May 24 23:22:37 CEST 2020
On 5/24/20 00:26, Gabriel Becker wrote:
> On Sat, May 23, 2020 at 9:59 PM Hervé Pagès <hpages using fredhutch.org
> <mailto:hpages using fredhutch.org>> wrote:
> On 5/23/20 17:45, Gabriel Becker wrote:
> > Maybe my intuition is just
> > different but when I collapse multiple character vectors together, I
> > expect all the characters from each of those vectors to be in the
> > resulting collapsed one.
> Yes I'd expect that too. But the **collapse** operation in paste() has
> never been about collapsing **multiple** character vectors together.
> What it does is collapse the **single** character vector that comes out
> of the 'sep' operation.
> I understand what it does, I broke ti down the same way in my post
> earlier in the thread. the fact remains is that it is a single function
> which significantly muddies the waters. so you can say
> paste0(x,y, collapse=",", recycle0=TRUE)
> is not a collapse operation on multiple vectors, and of course there's a
> sense in which you're not wrong (again I understand what these functions
> do), but it sure looks like one in the invocation, doesn't it?
> Honestly the thing that this whole discussion has shown me most clearly
> is that, imho, collapse (accepting ONLY one data vector) and
> paste(accepting multiple) should never have been a single function to
> begin with. But that ship sailed long long ago.
> paste(x, y, z, sep="", collapse=",")
> is analogous to
> sum(x + y + z)
> Honestly, I'd be significantly more comfortable if
> 1:10 + integer(0) + 5
> were an error too.
This is actually the recycling scheme used by mapply():
> mapply(function(x, y, z) c(x, y, z), 1:10, integer(0), 5)
Error in mapply(FUN = FUN, ...) :
zero-length inputs cannot be mixed with those of non-zero length
AFAIK base R uses 3 different recycling schemes for n-ary operations:
(1) The recycling scheme used by arithmetic and comparison operations
(Arith, Compare, Logic group generics).
(2) The recycling scheme used by classic paste().
(3) The recycling scheme used by mapply().
Having such a core mechanism like recycling being inconsistent across
base R is sad. It makes it really hard to predict how a given n-ary
function will recycle its arguments unless you spend some time trying it
yourself with several combinations of vector lengths. It is of course
the source of numerous latent bugs. I wish there was only one but that's
just a dream.
None of these 3 recycling schemes is perfect. IMO (2) is by far the
worst. (3) is too restrictive and would need to be refined if we wanted
to make it a good universal recycling scheme.
Anyway I don't think it makes sense to introduce a 4th recycling scheme
at this point even though it would be a nice item to put on the wish
list for R 7.0.0 with the ultimate goal that it will universally adopted
in R 11.0.0 ;-)
So if we have to do with what we have IMO (1) is the scheme that makes
most sense although I agree that it can do some surprising things for
some unusual combinations of vector lengths. It's the scheme I adhere to
in my own binary operations e.g. in S4Vector::pcompare().
The modest proposal of the 'recycle0' argument is only to let the user
switch from recycling scheme (2) to (1) if they're not happy with scheme
(2) (I'm one of them). Switching to scheme (3) or to a new custom scheme
would be a completely different proposal.
> At least I'm consistent right?
Anyway discussing recycling schemes is interesting but not directly
related with what the OP brought up (behavior of the 'collapse' operation).
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages using fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the R-devel