[R] Rows not common in dataframes

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Wed Sep 26 05:16:26 CEST 2012


%in% returns a logical vector. You should experiment with that operator without the indexing to complicate things for a bit to understand it.

If you read the "Introduction to R" sections on indexing, they describe three kinds of indexing. You can only use string indexing to access by name or rowname/colname. If you wish to index according to the contents of your vector/matrix/data frame, you need to find the matching data first. If you are satisfied with accessing the data in the order it appears in the object, then logical indexing is often simplest and %in% works. If you want to specify the order, or if you want to duplicate values in your output, you probably need integer indexing, in which case you need match. If you want to extract data on complex conditions (involving multiple tests) you probably need logical indexing, and cannot simultaneously specify ordering or duplication.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Chintanu <chintanu at gmail.com> wrote:

>Thank you for your assistance - Arun, Milan, Rui.
>Much appreciated.
>
>In a related note, I have never been sure of how and when to use the
>binary
>operator, %in%. If you could share any easy explanation to it, that
>would
>be very useful.
>
>Cheers,
>Chintanu
>
>=======================================
>
>On Mon, Sep 24, 2012 at 10:13 PM, arun <smartpink111 at yahoo.com> wrote:
>
>> Hi,
>>
>> Try this:
>>
>> set.seed(1)
>>  Dataframe_A<-data.frame(x=sample(1:10,5,replace=TRUE),y=rnorm(5,15))
>>  set.seed(1)
>>  Dataframe_B<-data.frame(x=sample(3:15,6,replace=TRUE),z=rnorm(6,10))
>>  Dataframe_A[!Dataframe_A[[1]] %in% Dataframe_B[[1]],] # Milan's code
>had
>> ...B[[2]]
>> #or
>> subset(Dataframe_A,!x%in%Dataframe_B[,1])
>> #  x        y
>> #1 3 16.27243
>> #2 4 15.41464
>> #5 3 14.70528
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> From: Milan Bouchet-Valat <nalimilan at club.fr>
>> To: Chintanu <chintanu at gmail.com>
>> Cc: R help <r-help at r-project.org>
>> Sent: Monday, September 24, 2012 3:30 AM
>> Subject: Re: [R] Rows not common in dataframes
>>
>> Le lundi 24 septembre 2012 � 13:22 +1000, Chintanu a �crit :
>> > Hi,
>> >
>> > I have two dataframes (Dataframe_A, Dataframe_B) with the same no.
>of
>> > columns. The first column of both the dataframes contains unique
>names.
>> > I wish to have Dataframe_A with the rows that are NOT common to
>> > Dataframe_B.
>> So you just want to drop some rows from A? In that case, do:
>> Dataframe_A <- Dataframe_A[!Dataframe_A[[1]] %in% Dataframe_B[[2]],]
>>
>> > With merge (), it is possible to get the common rows or to merge
>rows,
>> but
>> > I am not quite sure how to do it in a simpler way. Any help would
>be much
>> > appreciated.
>> No need for merge, as all rows you need are already in A.
>>
>>
>> My two cents
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>	[[alternative HTML version deleted]]
>
>
>
>------------------------------------------------------------------------
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list