[Rd] suggestion for "sets" tools upgrade

Duncan Murdoch murdoch.duncan at gmail.com
Fri Feb 7 13:37:33 CET 2014


On 14-02-06 8:31 PM, Carl Witthoft wrote:
> First, let me apologize in advance if this is the wrong place to submit
> a suggestion for a change to functions in the base-R package.  It never
> really occurred to me that I'd have an idea worthy of such a change.
>
> My idea is to provide an upgrade to all the "sets" tools (intersect,
> union, setdiff, setequal) that allows the user to apply them in a
> strictly algebraic style.
>
> The current tools, as well documented, remove duplicate values in the
> input vectors.  This can be helpful in stats work, but is inconsistent
> with the mathematical concept of sets and set measure.

I understand what you are asking for, but I think this justification for 
it is just wrong.  Sets don't have duplicated elements:  an element is 
in a set, or it is not.  It can't be in the set more than once.



What I propose
> is that all these functions be given an additional argument with a
> default value:  "multiple=FALSE" .  When called this way, the functions
> remain as at present.  When called with "multiple=TRUE,"  they treat the
> input vectors as true 'sets' of elements.
>
> I've already written and tested upgrades to all four functions, so if
> upgrading the base-R package is not appropriate, I'll post as a package
> to CRAN.  It just seems more sensible to add to the base.
>
> Thanks in advance for any advice or comments.
> (Please be sure to email, as I can't recall if I'm currently registered
> for r-devel)
>
> Here's an example of the new code:
>
> intersect<-function (x, y,multiple=FALSE)
> {
>       y <- as.vector(y)
> 	trueint <- y[match(as.vector(x), y, 0L)]
>       if(!multiple) trueint <- unique(trueint)
> 	return(trueint)
> }

This is not symmetric.  I'd like intersect(x,y,TRUE) to be the same as 
intersect(y,x,TRUE), up to re-ordering.  That's not true of your function:

 > x <- c(1,1,2,3)
 > y <- c(1,1,1,4)
 > intersect(x,y,multiple=TRUE)
[1] 1 1
 > intersect(y,x,multiple=TRUE)
[1] 1 1 1

I'd suggest that you clearly define what you mean by your functions, and 
put them in a package, along with examples where they give more useful 
results than the standard definitions.  I think the current base package 
functions match the mathematical definitions better.

Duncan Murdoch



More information about the R-devel mailing list