[R] efficiently diff two data frames

Rui Barradas ruipbarradas at sapo.pt
Tue Apr 16 20:12:37 CEST 2013


Hello,

Maybe Petr Savicky's answer in the link

https://stat.ethz.ch/pipermail/r-help/2012-February/304830.html

can lead you to what you want.
I've changed his function a bit in order  to return a logical vector 
into the rows where different rows return TRUE.

setdiffDF2 <- function(A, B){
     f <- function(X, Y)
         !duplicated(rbind(Y, X))[nrow(Y) + 1:nrow(X)]
     ix1 <- f(A, B)
     ix2 <- f(B, A)
     ix1 & ix2
}
ix <- setdiffDF2(Xe, Xf)
Xe[ix,]
Xf[ix,]


Note that this gives no information on the columns.
Hope this helps,

Rui Barradas

Em 16-04-2013 18:42, Liviu Andronic escreveu:
> Dear all,
> What is the quickest and most efficient way to diff two data frames,
> so as to obtain a vector of indices (or logical) for rows/columns that
> differ in the two data frames?  For example,
>> Xe <- head(mtcars)
>> Xf <- head(mtcars)
>> Xf[2:4,3:5] <- 55
>> all.equal(Xe, Xf)
> [1] "Component 3: Mean relative difference: 0.6863118"
> [2] "Component 4: Mean relative difference: 0.4728435"
> [3] "Component 5: Mean relative difference: 14.23546"
>
> I could use all.equal(), but it only returns human readable info that
> cannot be easily used programmatically. It also gives no info on the
> rows. Another way would be to:
> require(prob)
>> setdiff(Xe, Xf)
>                  mpg cyl disp  hp drat    wt  qsec vs am gear carb
> Mazda RX4 Wag  21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
> Datsun 710     22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
> Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
>
> But again this doesn't return subsetting indices, nor any info on hte
> columns. Any suggestions on how to approach this?
>
> Regards ,
> Liviu
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list