[R] Variable passed to function not used in function in select=... in subset

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Nov 11 15:54:32 CET 2008


Gavin Simpson wrote:
> On Tue, 2008-11-11 at 11:08 +0100, Wacek Kusnierczyk wrote:
>   
>> Gavin Simpson wrote:
>>     
>>>> d = data.frame(a = 1)
>>>> d$`-b` = 2
>>>> names(d)
>>>> # here we go
>>>>
>>>> subset(d, select = -b)
>>>> # to b or not to b?
>>>>     
>>>>         
>>> but -b is not the name of the column; you explicitly called it `-b` and
>>> you should refer to it as such. If you use "non-standard" names then
>>> expect to do a bit more work.
>>>   
>>>       
>> identical(names(d)[2], "-b")
>>
>> if i do
>>
>> d$`c` = 4
>>
>> then you claim d has no column named 'c'?
>>     
>
> No, where do you get that from?
>   
by simple analogy to the above, just read your own comments.  if you're
suggesting one should not expect this bit to be consistent, it would be
just another example of messy semantics.

>   
>>   do i have to refer to the c
>> column as `c`?
>>     
>
> No, but then "c" is a name that doesn't need to be quoted. -b is a name
> that needs to be quoted and if you quote it, things work as you might
> expect.
>   

not necessarily, as one of my examples showed:  again, the result of
subset(d, select=`-b`) will depend on whether d has a column named '-b',
and if it doesn't, on whether there is a variable called '-b' that is a
character vector.  there is no way out of this issue, backquoting is no
solution.  no further comment.

>   
>>     
>>>   
>>>       
>>>> subset(d, select = `-b`)
>>>>     
>>>>         
>>>   -b
>>> 1  2
>>>   
>>>       
>> ... and i have to use
>>
>> subset(d, select = `a`)
>>
>> and not
>>
>> subset(d, select = a)
>>
>> right?
>>     
>
> Is "a" a name in d? You can quote it if you want but it doesn't need to
> be quoted, so you can use either.
>   

you see, yo need to know whether 'a' is a name in d to know what
subset(d, select=a) would do.  no further comment.
>   
>>   besides, subset(d, select = `-b`) should rather return the
>> column(s) whose names are the value of the variable `-b`:
>>
>> `-a` = "a"
>> subset(d, select = `-a`)
>> # returns all columns except for the one named 'a', rather than the
>> column named '-a' -- but that's just because there is no such column in
>> d;  if there were, this one would be returned. 
>>     
>
> No, it returns a if you are following on from your original examples.
> `-a` refers to a variable (object) and that evaluates to "a" and "a" is
> component of d so is returned.
>   
you're right here, but the problem remains: subset(d, select=`-a`) will
treat `-a` as a column name or as a name of a variable with a vector of
column names, depending on what's in the data.  no further comment.


>   
>> so even with backquotes used, there is no obvious interpretation of what
>> select=`-b`should mean, because it depends on what names components of
>> the first argument have.  and this breaks the concept of referential
>> transparency.
>>
>> so the problem is not so easily explained away.  what subset does *is*
>> messy.
>>     
>
> In your opinion.
>   

yes, but not only mine.  perhaps some more r users will want to support
this claim; just wait.

> And without wanting to be rude or anything, your opinion carries very
> little weight in a project like R. You've arrived on the list and been
> very critical of the work of others. Now there is nothing wrong with
> being critical if it is constructive, and additionally with something
> like R you need to be constructive *and* contribute back. I'm not saying
> that if you did patch R to work the way you think is correct R Core will
> accept them as they need to maintain backwards compatibility and with S
> and not annoy the hundreds of package authors. but coming on here and
> criticising the work of others isn't going to win you many friends.
>   

that's really sad.  you're saying no one should ever criticize r without
reading the source code.  you are *really* not interested in feedback. 
note, feedback on the *design*, not implementation, is not fixed by
sending a patch.  you have a serious misconception here.

if i buy a tv, and read the quick guide, and start using it, and push
buttons, and suddenly get an electric shock, and complain to the
manufacturer, and they say i should have carefully read the 2K pages
manual because it says there i can get high voltage on my fingers while
pushing the buttons, and it's my fault, and if i want to complain i
should first study the schematics --- what??  they're just crazy, no?

> Also, subset (and the other things you've been harping on about) work as
> documented. So you kind of have to like it or lump it.
>   

we've just gone through the docs, and it's *you* who thinks it's so
beautifully clear from the docs what subset does.  i lump it.

>   
>>     
>>>> subset(d, select = - `-b`)
>>>>     
>>>>         
>>>   a
>>> 1 1
>>>
>>>   
>>>       
>>>> b = "a"
>>>> subset(d, select = -b)
>>>> # tragedy
>>>>     
>>>>         
>>> For this, I interpret it as not finding a column named b so tries to
>>> evaluate:
>>>
>>>   
>>>       
>> you interpret it.  how obvious is this for most users?
>> it tries to find a column named 'b', not a column named b.  that's the
>> problem with subset.
>>     
>
> If users read the documentation then they'd know about unary operators.
>   

if you read the reference you'd know touching this button may kill you. 
very very practical, indeed.

>
>>>> subset(d, select = - get(b))
>>>>     
>>>>         
>>>   -b
>>> 1  2
>>>
>>>   
>>>       
>> "use this hack to get around the design."
>>     
>
> No hack, that is what get() is for. b is *not* a component of d. - b (or
> `-`(b) evaluates to an error. If you want to select columns except the
> column referenced by the contents of b (which is "a") then you can use
> get().
>   

yes, that's a hack.  you have to go around that subset will first try to
find a column named 'b' and return everything else.  you do use get for
this, but that's still a hack.

>
>> i'd like you to point me to that warning, as i apparently need to read
>> it, but i haven't found it in the manual yet.  thanks.
>>     
>
> You could look at section 1.8 of An Introduction to R for a
> starter. ?Syntax is also a logical place to start and it explicitly
> refers you to details in the See Also section. If you read all of those
> (but I'll save you some time and point you to ?Quotes) you find the
> answers to how things like this work. ?Quotes explains what are
> syntactic names and how to use '`' backticks to quote non-syntactic
> names.
>
> Ok, ?Syntax and ?Quotes may not jump out at you as being very obvious
> places to look. If so, grab the source to the introduction to R manual,
> find a logical place to put this information or note to point people to
> the help pages and patch it accordingly. Then contribute that back to
> good of everyone.
>   

thanks.

>   
>>> Reading ?subset we have:
>>>
>>>   select: expression, indicating columns to select from a data frame.
>>>
>>> ....
>>>
>>>      For data frames, the 'subset' argument works on the rows.  Note
>>>      that 'subset' will be evaluated in the data frame, so columns can
>>>      be referred to (by name) as variables in the expression (see the
>>>      examples).
>>>
>>> which I think is reasonably explicit is it not? 
>>>       
>> about?  it says nothing about how the expression passed as the select
>> argument is treated.  it just says that the select argument is an
>> expression indicating columns (but how?), and then, in the middle of
>> explaining the subset parameter, it mentions that columns can be
>> referred to by name as variables in the expression.  how clear is this?
>>     

*how* clear is this?

>> the following does not work -- i'd expect it to, by virtue the clear
>> explanation:
>>
>> d = data.frame(a=1, b=2)
>> subset(d, select=c(a, "b"))
>> # what??  it does not break any 'specification' given in the docs
>>     
>
> Where is a? What is it. 
in subset(d, select=a) what is a?  where is it?  if in this case it is
in d, then in the one above it is there as well.  you'd like to say that
subset isn't smart enough to know that "columns can be referred to (by
name) as variables in the expression" can be applied to the a in the
c(a, "b"), am i right?

> And this is where you'll have to delve into the
> code or read more of the manuals or, how about you stop being so
> critical of peoples work and ask the people who do know why this doesn't
> work.
>   

you have good chances that i will stop, because the more i delve the
more i'm in the mud, and there is no appreciation for my effort.  pity
for you, not me.


>>> I'm sure we could all find aspects of R that don't work in exactly the
>>> way we might preconceive or think of as being intuitive. 
>>>       
>> most of it, seems like.
>>     
>
> Again, IYHO. Please do not think that your opinion is the same as others
> on this list or even R Core.
>   

good, but please do not think that i am the only one who disagrees with you.
as i said earlier, i'd estimate that the average reader of this list is
interested mostly in having r solve a particular problem, and does not
care about r as a design pearl.  without trying to underestimate anyone,
i guess that most readers have simply no opinion here.  and the r core
team is obviously biased towards defending their baby, so what.


>   
>>> But if it works
>>> as documented 
>>>       
>> in many cases, the documentation is insufficient, confusing, and
>> unhelpful when it comes to this sort of what you might call 'optimizations'.
>>     
>
> So where are your patches? You write a lot of critique on this list but
> I haven't see any public patches offered on R-Devel from you.
>   

my comments are my patches.  i address design issues, and if anyone
responsible for the stuff i criticize says the critique is valid, i may
start thinking about implementation.  if implementation before design is
the main approach to development in r, no wonder it is how it is.  so
far, no one really wants to discuss.  that's deeply disappointing. 
besides, there is a good load of arogance flowing from above, just scan
for responses to users' submitting patches that say 'your report is
annoying'.  thank you very much. 

>   
>>> then I don't see what the problem is unless i) you are
>>> offering to rewrite the code to make it "work better", ii) that R Core
>>> thinks any proposal "works better" and iii) in doing so it doesn't break
>>> most of the R code out there in R itself or in add-on packages.
>>>   
>>>       
>> i'd prefer r to work better rather than "work better".  i'm afraid that
>> serious improvements to r must, by necessity, break quite a lot of
>> earlier code, which exploits, if only due the impossibility of not doing
>> so, such design.
>>     
>
> Again, IYHO. A lot of us have worked with/around infelicities of design
> and don't want R breaking all of our code because it was "fixed". Now,
> all the source is there and you could go and make the changes you want
> and even release is if you see fit. No-one is stopping you.
>   

again, you just don't follow.  you're encouraging bad practice -- should
one spend days on writing a patch before the issue is discussed at the
design level?  that's pointless.

>> it certainly is a good idea to offer to contribute and i'd be happy to
>> do so, but i wouldn't be given a chance, i suppose.
>> besides, i try not to imagine what hides under the surface of a language
>> with such a design.
>>     
>
> Have you tried? But bear in mind that R Core has more to balance that
> just whether you think a design "flaw" or infelicity etc should be fixed
> when it decides whether to accept patches.
>   

my whole posting is an attempt, may you try to notice.

vQ



More information about the R-help mailing list