[Rd] setdiff for data frames

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Dec 11 08:36:49 CET 2007


On Mon, 10 Dec 2007, Charles C. Berry wrote:

> On Mon, 10 Dec 2007, G. Jay Kerns wrote:
>
>> Hello,
>>
>> I have been interested in setdiff() for data frames that operates
>> row-wise.  I looked in the documentation, mailing lists, etc., and
>> didn't find exactly the right thing.  Given data frames A, B with the
>> same columns, the goal is to extract the rows that are in A, but not
>> in B.  Of course, one can usually do setdiff(rownames(A), rownames(B))
>> but that is cheating.  :-)
>>
>> I played around a little bit and came up with
>>
>> setdiff.data.frame = function(A, B){
>>     g <-  function( y, B){
>>                 any( apply(B, 1, FUN = function(x)
>> identical(all.equal(x, y), TRUE) ) ) }
>>     unique( A[ !apply(A, 1, FUN = function(t) g(t, B) ), ] )
>> }
>>
>> I am sure that somebody can do this a better/faster way... any ideas?
>
> setdiff.data.frame <-
>    function(A,B) A[ !duplicated( rbind(B,A) )[ -seq_len(nrow(B))] , ]
>
> This ignores rownames(A) which may not be what is wanted in every case.

I was about to suggest using the approach taken by duplicated.data.frame, 
(which is to 'hash' the rows to a character vector) then call setdiff.
E.g.

a <- do.call("paste", c(A, sep = "\r"))
b <- do.call("paste", c(B, sep = "\r"))
A[match(setdiff(a, b),a), ]

Note that apply() is intended for matrices (not data frames) and the 
version given can do a horrendous amount of coercion, whereas the above 
does it only once.

>
> HTH,
>
> Chuck
>
>> Any chance we could get a data.frame method for set.diff in future R
>> versions? (The notion of "set" is somewhat ambiguous with respect to
>> rows, columns, and entries in the data frame case.)

No chance: if you have not found it in the archives, it is too rare a 
request.

>> Jay
>>
>> P.S. You can see what I'm looking for with
>>
>> A <- expand.grid( 1:3, 1:3 )
>> B <- A[ 2:5, ]
>> setdiff.data.frame(A,B)


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list