[R] intersect() without discarding duplicates?

David Winsemius dwinsemius at comcast.net
Fri May 21 03:20:39 CEST 2010


On May 20, 2010, at 7:10 PM, David Winsemius wrote:

>
> On May 20, 2010, at 6:24 PM, Jonathan wrote:
>
>> Thanks, but that doesn't quite work, since I'd want the result of
>> b[b %in% a] to be symmetric with a[a%in%b] (so if there are two 2's
>> in EACH vector, I'll get two 2's in the result, but if there are two
>> 2's in only one vector, but one two in the other, the result will
>> show only one 2.
>>
>> Consider:
>>
>>> a <- c(2,4,3)
>>> b<-c(6,6,5,2,2,8,4)
>>
>>> b[b %in% a]
>> [1] 2 2 4
>>
>>> a[a%in%b]
>> [1] 2 4
>>
>> The second answer is correct, but I can't predict which variable to
>> put in which position in the statement, so I'd need them both to be
>> correct.
>
> Perhaps you should look at something along the lines of :
>> a <- c(2,4,2,3)
>> b<-c(6,6,5,2,2,8,4)
>> a[a %in% b]
> [1] 2 4 2
>
>> merge(data.frame(table(a)), data.frame(table(b)), by.x="a",
> by.y="b" )
>   a Freq.x Freq.y
> 1 2      2      2
> 2 4      1      1
>
> And then do a pmin() on the Freq's

  somefn <- function(a,b) { xtb <- merge(data.frame(table(a)),
                                   data.frame(table(b)), by.x="a",  
by.y="b" );
                            rep(xtb[,1], pmin(xtb[,2], xtb[,3]) )}
 > somefn(a,b)
[1] 2 2 4
Levels: 2 3 4
 > somefn(b,a)
[1] 2 2 4
Levels: 2 4 5 6 8


 > a <- c(2,3,4)
 > somefn(b,a)
[1] 2 4
Levels: 2 4 5 6 8
 > somefn(a,b)
[1] 2 4
Levels: 2 3 4

If you wanted it as a vector, then wrap in as.numeric(as.character( ) )


>
> -- 
> David.
>
>>
>> Best,
>> Jonathan
>>
>> On Thu, May 20, 2010 at 6:10 PM, David Winsemius <dwinsemius at comcast.net
>>> wrote:
>>
>> On May 20, 2010, at 5:58 PM, Jonathan wrote:
>>
>> Hi all,
>> The ?intersect entry kindly points out that it discards duplicate
>> entries.  I'm looking, however, to get the intersection while KEEPING
>> duplicate entries, and there are no instructions on how to
>> accomplish this
>> using intersect().
>>
>> Does anybody have any idea how this might be done, or am I going to
>> need to
>> program something from scratch (something like ordering the vectors
>> and then
>> looping through them)?
>>
>>
>>
>> ex:
>>
>> a <- c(2,4,2,3)
>> b<-c(6,6,5,2,2,8,4)
>> intersect(a,b)
>> [1] 2 4
>>
>>> b %in% a
>> [1] FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE
>>
>> # Now use logical indexing on "b"
>>
>>> b[b %in% a]
>> [1] 2 2 4
>>
>>
>>
>>
>> I'd hope the answer to be 2 2 4.
>>
>> Regards,
>> Jonathan
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list