[R] compare two data frames of different dimensions and only keep unique rows

jim holtman jholtman at gmail.com
Mon Feb 27 18:41:42 CET 2012


is this what you want:

> v <- rbind(reported, exportfile)
> v[!duplicated(v), ]
       Product    Price Nbr.Lots
1        Cocoa  2331.00      -61
2        Cocoa  2356.00      -61
3        Cocoa  2440.00        5
4        Cocoa  2450.00        1
6     Coffee C   204.55       40
7     Coffee C   205.45       40
5           GC 17792.00       -1
10 Sugar No 11    24.81       -1
8           ZS  1273.50       -1
9           ZS  1276.25        1
13       Cocoa  2440.00        6
>


On Mon, Feb 27, 2012 at 12:36 PM, Arnaud Gaboury
<arnaud.gaboury at a2ct2.com> wrote:
> Dear list,
>
> I am still struggling with something that should be easy: I compare two data frames with a lot of common rows and want to keep only rows that are NOT in both data frames, unique.
>
> Here are an example of these data frame.
>
> reported <-
> structure(list(Product = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 5L, 5L), .Label = c("Cocoa", "Coffee C", "GC", "Sugar No 11", "ZS"), class = "factor"), Price = c(2331, 2356, 2440, 2450, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61L, -61L, 5L, 1L, 40L, 40L, -1L, -1L, -1L, 1L)), .Names = c("Product", "Price", "Nbr.Lots"), row.names = c(1L, 2L, 3L, 4L, 6L, 7L, 5L, 10L, 8L, 9L), class = "data.frame")
>
> exportfile <-
> structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11", "ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40, 40, -1, -1, -1, 1)), .Names = c("Product", "Price", "Nbr.Lots"), row.names = c(NA, 9L), class = "data.frame")
>
> I can rbind() them, thus resulting in one data frame with duplicated row, but I have no idea how to delete duplicated rows. I have tried plyaing with unique(), duplicated with no success
>
> v<-rbind(exportfile,reported)
> v <-
> structure(list(Product = c("Cocoa", "Cocoa", "Cocoa", "Coffee C",
> "Coffee C", "GC", "Sugar No 11", "ZS", "ZS", "Cocoa", "Cocoa",
> "Cocoa", "Cocoa", "Coffee C", "Coffee C", "GC", "Sugar No 11",
> "ZS", "ZS"), Price = c(2331, 2356, 2440, 204.55, 205.45, 17792,
> 24.81, 1273.5, 1276.25, 2331, 2356, 2440, 2450, 204.55, 205.45,
> 17792, 24.81, 1273.5, 1276.25), Nbr.Lots = c(-61, -61, 6, 40,
> 40, -1, -1, -1, 1, -61, -61, 5, 1, 40, 40, -1, -1, -1, 1)), .Names = c("Product",
> "Price", "Nbr.Lots"), row.names = c("1", "2", "3", "4", "5",
> "6", "7", "8", "9", "11", "21", "31", "41", "61", "71", "51",
> "10", "81", "91"), class = "data.frame")
>
>
> TY for your help
>
> Arnaud Gaboury
>
> A2CT2 Ltd.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list