[R] Issue with %in% - not matching identical rows in data frames

Charles C. Berry cberry at tajo.ucsd.edu
Tue Nov 3 19:04:42 CET 2009



Kaushik,

The documentation doesn't quite tell (me, anyway) how the function behaves 
when 'target' is a list (or data.frame). You'll need to dig into match.c 
or experiment with match() or %in% to see what it is actually doing.

But it looks like it is matching whole columns of the data.frame rather 
than elements within each column :

>  sequence %in% sequence
[1] TRUE TRUE TRUE TRUE TRUE TRUE
>  sequence %in% rev(sequence)
[1] TRUE TRUE TRUE TRUE TRUE TRUE
>
>  sequence[1,] %in% sequence
[1] FALSE FALSE FALSE FALSE FALSE FALSE
>  sequence[1,] %in% sequence[1,]
[1] TRUE TRUE TRUE TRUE TRUE TRUE
>

Maybe you wanted something like

 	mapply( function(x,y) x%in%y , sequence[7, ], today.sequence )

??

HTH,

Chuck


On Tue, 3 Nov 2009, Kaushik Krishnan wrote:

> Hi folks
>
> I have two data frames.  I know that the nth (let's say the 7th) row
> in the first data frame (sequence) is there in the second
> (today.sequence).  When I try to check that by doing 'sequence[7,]
> %in% today.sequence', I get all FALSE when it should be all TRUE.
>
> I'm certain I'm making some trivial mistake.  Any solutions?
>
> The code to recreate the data frames and see for yourself is:
> ----
> sequence <- structure(list(DATE = structure(c(14549, 14549, 14553, 14550,
> 14557, 14550, 14551, 14550), class = "Date"), DATASET = c(1L,
> 2L, 1L, 2L, 2L, 3L, 3L, 4L), REP = c(1L, 0L, 2L, 2L, 3L, 0L,
> 1L, 0L), WRONGS_ABS = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), WRONGS_RATIO = c(0L,
> 0L, 0L, 0L, 0L, 0L, 0L, 0L), DONE = c(1L, 1L, 0L, 1L, 0L, 1L,
> 0L, 0L)), .Names = c("DATE", "DATASET", "REP", "WRONGS_ABS",
> "WRONGS_RATIO", "DONE"), class = "data.frame", row.names = c(NA,
> -8L))
>
> today.sequence <- structure(list(DATE = structure(c(14551, 14550),
> class = "Date"),
>    DATASET = 3:4, REP = c(1L, 0L), WRONGS_ABS = c(0L, 0L),
> WRONGS_RATIO = c(0L,
>    0L), DONE = c(0L, 0L)), .Names = c("DATE", "DATASET", "REP",
> "WRONGS_ABS", "WRONGS_RATIO", "DONE"), row.names = 7:8, class = "data.frame")
>
> sequence[7,] #You should see '2009-11-03       3   1          0
>    0    0'
>
> today.sequence #You can clearly see that sequence [7,] is the first
> row in today.sequence
>
> sequence[7,] %in% today.sequence #This should show 'TRUE TRUE TRUE
> TRUE TRUE TRUE'.  Instead
> # it shows 'FALSE FALSE FALSE FALSE FALSE FALSE'
> ----
>
> Thanks
>
> -- 
> Kaushik Krishnan
> (kaushik.s.krishnan at gmail.com)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901




More information about the R-help mailing list