[Rd] suggestion for "sets" tools upgrade

Kevin Coombes kevin.r.coombes at gmail.com
Fri Feb 7 10:59:16 CET 2014

As a mathematician by training (and a former practicing mathematician, 
both of which qualifications I rarely feel compelled to pull out of the 
closet), I have to agree with Michael's challenge to the original 
assertion about the "mathematical concept of sets".

Sets are collections of distinct objects (at least in Cantors' original 
naive definition) and do not have a notion of "duplicate values".  In 
the modern axiomatic definition, one axiom is that "two sets are equal 
if and only if they contain the same members". To expand on Michael's 
example, the union of {1, 2} with {1, 3} is {1, 2, 3}, not {1, 2, 1, 3} 
since there is only one distinct object designated by the value "1".

A computer programming language could choose to use the ordered vector 
(or list) [1, 2, 1, 3] as an internal representation of the union of 
[1,2], and [1,3], but it would then have to work hard to perform every 
other meaningful set operation.  For instance, the cardinality of the 
union still has to equal three (not four, which is the length of the 
list), since there are exactly three distinct objects that are members. 
And, as Michael points out, the set represented by [1,2,3] has to be 
equal to the set represented by [1,2,1,3] since they contain exactly the 
same members.


On 2/6/2014 9:39 PM, R. Michael Weylandt wrote:
> On Thu, Feb 6, 2014 at 8:31 PM, Carl Witthoft <carl at witthoft.com> wrote:
>> First, let me apologize in advance if this is the wrong place to submit a
>> suggestion for a change to functions in the base-R package.  It never really
>> occurred to me that I'd have an idea worthy of such a change.
>> My idea is to provide an upgrade to all the "sets" tools (intersect, union,
>> setdiff, setequal) that allows the user to apply them in a strictly
>> algebraic style.
>> The current tools, as well documented, remove duplicate values in the input
>> vectors.  This can be helpful in stats work, but is inconsistent with the
>> mathematical concept of sets and set measure.
> No comments about back-compatability concerns, etc. but why do you
> think this is closer to the "mathematical concept of sets"? As I
> learned them, sets have no repeats (or order) and other languages with
> set primitives tend to agree:
> python> {1,1,2,3} == {1,2,3}
> True
> I believe C++ calls what you're looking for a multiset (albeit with a
> guarantee or orderedness).
> Cheers,
> Michael
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list