[Rd] setdiff bizarre

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Jun 2 20:03:36 CEST 2009


William Dunlap wrote:
> %in% is a thin wrapper on a call to match().  match() is
> not a generic function (and is not documented to be one),
> so it treats data.frames as lists, as their underlying
> representation is a list of columns.  match is documented
> to convert lists to character and to then run the character
> version of match on that character data.  match does not
> bail out if the types of the x and table arguments don't match
> (that would be undesirable in the integer/numeric mismatch case).
>   

yes, i understand that this is documented behaviour, and that it's not a
bug.  nevertheless, the example is odd, and hints that there's a design
flaw.  i also do not understand why the following should be useful and
desirable:

    as.character(list('a'))
    # "a"

    as.character(data.frame('a'))
    # "1"

and hence

    'a' %in% list('a')
    # TRUE

while

    'a' %in% data.frame('a')
    # FALSE
    '1' %in% data.frame('a')
    # TRUE

there is a mechanistic explanation for how this works, but is there one
for why this works this way?


> Hence
>    '1' %in% data.frame(1) # -> TRUE
> is acting consistently with
>    match(as.character(pi), c(1, pi, exp(1))) # -> 2
> and
>    1L %in% c(1.0, 2.0, 3.0) # -> TRUE
>
> The related functions, duplicated() and unique(), do have
> row-wise data.frame methods.  E.g.,
>    > duplicated(data.frame(x=c(1,2,2,3,3),y=letters[c(1,1,2,2,2)]))
>    [1] FALSE FALSE FALSE FALSE  TRUE
> Perhaps match() ought to have one also.  S+'s match is generic
> and has a data.frame method (which is row-oriented) so there we get:
>    >  match(data.frame(x=c(1,3,5), y=letters[c(1,3,5)]),
> data.frame(x=1:10,y=letters[1:10]))
>    [1] 1 3 5
>    > is.element(data.frame(x=1:10,y=letters[1:10]),
> data.frame(x=c(1,3,5), y=letters[c(1,3,5)]))
>     [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
>
> I think that %in% and is.element() ought to remain calls to match()
> and that if you want them to work row-wise on data.frames then
> match should get a data.frame method.
>   

sounds good to me.  how is

    'a' %in% data.frame('a')

in S+?

thanks for the response.

regards,
vQ



More information about the R-devel mailing list