[R] compare two data frames of different dimensions and only keep unique rows

Arnaud Gaboury arnaud.gaboury at a2ct2.com
Mon Feb 27 19:10:57 CET 2012


No, but I tried your way too.

In fact, the only three unique rows are these ones:

 Product Price Nbr.Lots
   Cocoa  2440        5
   Cocoa  2450        1
   Cocoa  2440        6

Here is a dirty working trick I found :

> df<-merge(exportfile,reported,all.y=T)
> df1<-merge(exportfile,reported)
> dff1<-do.call(paste,df)
> dff<-do.call(paste,df)
> dff1<-do.call(paste,df1)
> df[!dff %in% dff1,]
  Product Price Nbr.Lots
3   Cocoa  2440        5
4   Cocoa  2450        1
 

My two problems are : I do think it is not so a clean code, then I won't know by advance which of my two df will have the greates dimension (I can add some lines to deal with it, but again, seems very heavy).

I hoped I could find a better solution.


A2CT2 Ltd.


-----Original Message-----
From: jim holtman [mailto:jholtman at gmail.com] 
Sent: lundi 27 février 2012 18:42
To: Arnaud Gaboury
Cc: r-help at r-project.org
Subject: Re: [R] compare two data frames of different dimensions and only keep unique rows

is this what you want:

> v <- rbind(reported, exportfile)
> v[!duplicated(v), ]
       Product    Price Nbr.Lots
1        Cocoa  2331.00      -61
2        Cocoa  2356.00      -61
3        Cocoa  2440.00        5
4        Cocoa  2450.00        1
6     Coffee C   204.55       40
7     Coffee C   205.45       40
5           GC 17792.00       -1
10 Sugar No 11    24.81       -1
8           ZS  1273.50       -1
9           ZS  1276.25        1
13       Cocoa  2440.00        6
>


On Mon, Feb 27, 2012 at 12:36 PM, Arnaud Gaboury <arnaud.gaboury at a2ct2.com> wrote:
> Dear list,
>
> I am still struggling with something that should be easy: I compare two data frames with a lot of common rows and want to keep only rows that are NOT in both data frames, unique.
>
> Here are an example of these data frame.
>
> reported <-
> structure(list(Product = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 
> 5L, 5L), .Label = c("Cocoa", "Coffee C", "GC", "Sugar No 11", "ZS"), 
> class = "factor"), Price = c(2331, 2356, 2440, 2450, 204.55, 205.45, 
> 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61L, -61L, 5L, 1L, 40L, 
> 40L, -1L, -1L, -1L, 1L)), .Names = c("Product", "Price", "Nbr.Lots"), 
> row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 5L, 10L, 8L, 9L), class = 
> "data.frame")
>
> exportfile <-
> structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C", 
> "Coffee C", "GC", "Sugar No 11", "ZS", "ZS"), Price = c(2331, 2356, 
> 2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = 
> c(-61, -61, 6, 40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price", 
> "Nbr.Lots"), row.names = c(NA, 9L), class = "data.frame")
>
> I can rbind() them, thus resulting in one data frame with duplicated 
> row, but I have no idea how to delete duplicated rows. I have tried 
> plyaing with unique(), duplicated with no success
>
> v<-rbind(exportfile,reported)
> v <-
> structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C", 
> "Coffee C", "GC", "Sugar No 11", "ZS", "ZS", "Cocoa", "Cocoa", 
> "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11", "ZS", 
> "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792, 24.81, 
> 1273.5, 1276.25, 2331, 2356, 2440, 2450, 204.55, 205.45, 17792, 24.81, 
> 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40, 40, -1, -1, -1, 1, 
> -61, -61, 5, 1, 40, 40, -1, -1, -1, 1)), .Names = c("Product", 
> "Price", "Nbr.Lots"), row.names = c("1", "2", "3", "4", "5", "6", "7", 
> "8", "9", "11", "21", "31", "41", "61", "71", "51", "10", "81", "91"), 
> class = "data.frame")
>
>
> TY for your help
>
> Arnaud Gaboury
>
> A2CT2 Ltd.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list