[R] Compare two dataframes

Mark Na mtb954 at gmail.com
Fri Dec 17 22:39:37 CET 2010


Hi Petr,

Many thanks for your help. I like your solution because (and I did not
know this) the unique function works on ALL the data at once (i.e.,
across all of the columns) which means I don't have to make a unique
ID field by pasting together all of the rows or run through all of the
columns iteratively (say, by using a loop).

However, if the dataframe contains non-unique rows (two rows with
exactly the same values in each column) then the unique function will
delete one of them and that may not be desirable. So, caution is
required.

Thanks again for the time you took to help me better understand the
unique function. Much appreciated. Děkuji!

Mark



On Fri, Dec 17, 2010 at 2:27 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
> On Thu, Dec 16, 2010 at 01:02:29PM -0600, Mark Na wrote:
>> Hello,
>>
>> I have two dataframes DF1 and DF2 that should be identical but are not
>> (DF1 has some rows that aren't in DF2, and vice versa). I would like
>> to produce a new dataframe DF3 containing rows in DF1 that aren't in
>> DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1).
>
> The function unique(DF) removes duplicated rows of DF and keeps the unique
> rows in the order of their first occurrence. So, if DF1 does not contain
> duplicated rows, then unique(rbind(DF1, DF2)) contains first DF1 and
> then the rows, which are unique to DF2, if there are any. The order of
> the rows in the result depends on the order of the original data frames
> and if DF2 contains several instances of a row, which is not in DF1, we
> get only the first instance of this row in the difference.
>
>  #MAKE SOME DATA
>  cars$id <- paste(cars$speed, cars$dist, sep="") #create unique ID field by pasting all columns together
>  cars1 <- cars[1:35, ]
>  cars2 <- cars[16:50, ]
>
>  #EXTRACT UNIQUE ROWS
>  cars1_unique <- cars1[cars1$id %in% setdiff(cars1$id, cars2$id), ] #rows unique to cars1 (i.e., not in cars2)
>  cars2_unique <- cars2[cars2$id %in% setdiff(cars2$id, cars1$id), ] #rows unique to cars2
>
>  cars1_set <- unique(cars1)
>  cars2_set <- unique(cars2)
>
>  cars1_plus <- unique(rbind(cars1_set, cars2_set))
>  cars2_plus <- unique(rbind(cars2_set, cars1_set))
>
>  cars1_diff <- cars2_plus[ - seq(nrow(cars2_set)), ]
>  cars2_diff <- cars1_plus[ - seq(nrow(cars1_set)), ]
>
>  all(cars1_unique == cars1_diff) # [1] TRUE
>  all(cars2_unique == cars2_diff) # [1] TRUE
>
> Petr Savicky.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list