# [Rd] suggestion for "sets" tools upgrade

Kevin Coombes kevin.r.coombes at gmail.com
Fri Feb 7 10:59:16 CET 2014

```As a mathematician by training (and a former practicing mathematician,
both of which qualifications I rarely feel compelled to pull out of the
closet), I have to agree with Michael's challenge to the original
assertion about the "mathematical concept of sets".

Sets are collections of distinct objects (at least in Cantors' original
naive definition) and do not have a notion of "duplicate values".  In
the modern axiomatic definition, one axiom is that "two sets are equal
if and only if they contain the same members". To expand on Michael's
example, the union of {1, 2} with {1, 3} is {1, 2, 3}, not {1, 2, 1, 3}
since there is only one distinct object designated by the value "1".

A computer programming language could choose to use the ordered vector
(or list) [1, 2, 1, 3] as an internal representation of the union of
[1,2], and [1,3], but it would then have to work hard to perform every
other meaningful set operation.  For instance, the cardinality of the
union still has to equal three (not four, which is the length of the
list), since there are exactly three distinct objects that are members.
And, as Michael points out, the set represented by [1,2,3] has to be
equal to the set represented by [1,2,1,3] since they contain exactly the
same members.

Kevin

On 2/6/2014 9:39 PM, R. Michael Weylandt wrote:
> On Thu, Feb 6, 2014 at 8:31 PM, Carl Witthoft <carl at witthoft.com> wrote:
>> First, let me apologize in advance if this is the wrong place to submit a
>> suggestion for a change to functions in the base-R package.  It never really
>> occurred to me that I'd have an idea worthy of such a change.
>>
>> My idea is to provide an upgrade to all the "sets" tools (intersect, union,
>> setdiff, setequal) that allows the user to apply them in a strictly
>> algebraic style.
>>
>> The current tools, as well documented, remove duplicate values in the input
>> vectors.  This can be helpful in stats work, but is inconsistent with the
>> mathematical concept of sets and set measure.
> think this is closer to the "mathematical concept of sets"? As I
> learned them, sets have no repeats (or order) and other languages with
> set primitives tend to agree:
>
> python> {1,1,2,3} == {1,2,3}
> True
>
> I believe C++ calls what you're looking for a multiset (albeit with a
> guarantee or orderedness).
>
> Cheers,
> Michael
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

```