[Rd] surprising behaviour of names<-

Thu Mar 12 21:26:15 CET 2009

Berwin A Turlach wrote:
> On Thu, 12 Mar 2009 15:21:50 +0100
> Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
>
>   
>> seems to suggest?  is not the purpose of documentation to clearly,
>> ideally beyond any doubt, specify what is to be specified?
>>     
>
> The R Language Definition manual is still a draft. :)
>   

this is indeed a good explanation for all sorts of nonsense.  worse if
stuff tends to persist despite critique.

>   
>>> that in this case the infix and prefix syntax
>>> is not equivalent as it does not say that 
>>>   
>>>       
>> are you suggesting fortune telling from what the docs do *not* say?
>>     
>
> My experience is that sometimes you have to realise what is not
> stated.  

in general, yes.  in r, this often ends up with 'have you seen the
documentation saying that??' in response.

> I remember a discussion with somebody who asked why he could
> not run, on windows, R CMD INSTALL on a *.zip file.  I pointed out to
> him that the documentation states that you can run R CMD INSTALL on
> *.tar.gz or *.tgz files and, thus, there should be no expectation that
> it can be run on *.zip file.
>   

yes, that's a good point.  this reminds me of a (possibly anectodal)
lady who sued the manufacturer of her microwave after she had dried in
it her cat after a bath.

> YMMV, but when I read a passage like this in R documentation, I start
> to wonder why it is stated that 
> 	names(x) <- c("a","b")
> is equivalent to 
> 	*tmp* <- x
> 	x <- "names<-"('*tmp*', value=c("a","b"))
> and the simpler construct
> 	x <- "names<-"(x, value=c("a", "b"))
> is not used.  There must be a reason, 

got an explanation:  because it probably is as drafty as the
aforementioned document.

> nobody likes to type
> unnecessarily long code.  And, after thinking about this for a while,
> the penny might drop.
>   

that's cool.  instead of stating what 'names<-' does or does not, one
expresses it in a convoluted way an makes you guess from a *tmp*
variable. a nice exercise, i like it.

> [...] 
>   
>>>> does this say anything about what 'names<-'(...) actually
>>>> returns?  updated *tmp*, or a copy of it?
>>>>     
>>>>         
>>> Since R uses pass-by-value, 
>>>       
>> since?  it doesn't!
>>     
>
> For all practical purposes it is as long as standard evaluation is
> used.  One just have to be aware that some functions evaluate their
> arguments in a non-standard way.  
>   

it's maybe a bit of hairsplitting, but what you have in r is not exactly
what is called 'pass by value'.  here's a relevant quote from [1], p. 309:

"
In the call-by-name (CBN) mechanism, a formal parameter names the
computation designated by an unevaluated argument expression.

In the call-by-value (CBV) mechanism, a formal parameter names the value
of an evaluated argument expression.

In the call-by-need or lazy evaluation (CBL), the formal parameter name
can be bound to a location that originally stores the computation of the
argument expression. The first time the parameter is referenced, the
computation is performed, but the resulting value is cached at the
location and is used on every subsequent reference. Thus, the argument
expression is evaluated at most once and is never evaluated at all if
the parameter is never referenced.
"

note the 'unevaluated' and 'evaluated'.  you're free to have your pick. 

but it is possible to send an argument to a function that makes an
assignment to the argument, and yet the assignment is made to the
original, not to a copy:

    foo = function(arg) arg$foo = foo

    e = new.env()
    foo(e)
    e$foo

are you sure this is pass by value?

it appears that r has a pass-by-need mechanism that dispatches to
pass-by-value or pass-by-reference depending on the type of the object. 
with this semantics, all sorts of mess are possible, and 'names<-'
provides one example.

[1] design concepts in programming languages, turbak and gifford, mit
press 2008

> [...]
>   
>>> If you entertain the idea that 'names<-' updates *tmp* and
>>> returns the updated *tmp*, then you believe that 'names<-' behaves
>>> in a non-standard way and should take appropriate care. 
>>>       
>> i got lost in your argumentation.  [..]
>>     
>
> I was commenting on "does this say anything about what 'names<-'(...)
> actually returns?  updated *tmp*, or a copy of it?"
>
> As I said, if you entertain the idea that 'names<-' returns an updated
> *tmp*, then you believe that 'names<-' behaves in a non-standard way
> and appropriate care has to be taken.
>
>   

i can check, by experimentation, whether 'names<-' returns a copy or the
original; even if i can establish that it returns the original after
having modified it, it's not something to entertain.  maybe you
entertain the idea of your users performing the guesswork instead of
reading an unambiguous specification.  you have already said that you
don't care if your users get confused, it would fit the image.

and actually, in the example we discuss, 'names<-' does *not* return an
updated *tmp*, so there's even less to entertain.  for fun and more
guesswork, the example could have been:

    x = x
    x = 'names<-'(x, value=c('a', 'b'))

for your interest in well written documentation, ?names says that the
argument x is 'an r object', and nowhere does it say that environment is
not an r object.  it also says what the value of 'names<-' applied to
pairlists is.  the following error message is doubly surprising:

    e = new.env()
    'names<-'(e, 'foo')
    # Error: names() applied to a non-vector

firstly, because it would seem that there's nothing wrong in applying
names to an environment;  from ?'$':

"
    x$name

    name: A literal character string or a name (possibly backtick
          quoted).  For extraction, this is normally (see under
          'Environments') partially matched to the 'names' of the
          object.
"

secondly, because, as ?names says, names can be applied to pairlists,
which are not vectors, and the following does not give an error as above:

    p = pairlist()
    is.vector(p)
    # FALSE
    names(p)
    # names successfully applied to a non-vector

assure me this is not a mess, but a well-documented design feature.

>>> And the fact that a variable *tmp* is used hints to the fact that
>>> 'names<-' might have side-effect.  
>>>       
>> are you suggesting fortune telling from the fact that a variable *tmp*
>> is used?
>>     
>
> Nothing to do with fortune telling.  One reads the manual, one wonders
> why is this construct used instead of an apparently much more simple
> one, one reflects and investigates, one realises why the given
> construct is stated as the equivalent: because "names<-" has
> side-effects.
>   

... and one wonders why r man pages have to be read in O(e^n) time.

>   
>>> This is similar to the discussion what value i should have in the
>>> following C snippet:
>>> 	i = 0;
>>>  	i += i++;
>>>   
>>>       
>> nonsense, it's a *completely* different issue.  here you touch the
>> issue of the order of evaluation, and not of whether an object is
>> copied or modified;  above, the inverse is true.
>>     
>
> Sorry, there was a typo above.  The second statement should have been
> 	i = i++;
>   

it was fine, as i acknowledged in another mail, i got into deep trouble
and had to admit the specification does not guarantee the final value. 
but it was irrelevant.

> Then on some abstract level they are the same; an object appears on the
> left hand side of an assignment but is also modified in the expression
> assigned to it.  So what value should it end up with?
>   

on this abstract level it's fine, but we can go up to the most
philosophical issue this way.  let's not.

>    
>   
>>>> why?  you can still use the infix names<- with destructive
>>>> semantics to avoid copying. 
>>>>     
>>>>         
>>> I guess that would require a rewrite (or extension) of the parser.
>>> To me, Section 10.1.2 of the Language Definition manual suggests
>>> that once an expression is parsed, you cannot distinguish any more
>>> whether 'names<-' was called using infix syntax or prefix syntax.
>>>   
>>>       
>> but this must be nonsense, since:
>>
>>     x = 1
>>     'names<-'(x, 'foo')
>>     names(x)
>>     # NULL
>>
>>     x = 1
>>     names(x) <- 'foo'
>>     names(x)
>>     # "foo"
>>
>> clearly, there is not only syntactic difference here.  but it might be
>> that 10.1.2 does not suggest anything like what you say.
>>     
>
> Please tell me how this example contradicts my reading of 10.1.2 that
> the expressions 
> 	'names<-'(x, 'foo')
> and
> 	names(x) <- 'foo'
> once they are parsed, produce exactly the same parse tree and that it
> becomes impossible to tell from the parse tree whether originally the
> infix syntax or the prefix syntax was used.  

because if they produced the same parse tree, you would either have to
have the same result in both cases (because the same parse tree is
interpreted), or you'd have to magically interpret the same tree in two
different ways, depending on the (lost in translation) original
syntactic form.  or:  the interpreter interprets the parse tree
simultaneously looking at the original expression to choose the right
way to interpret the tree.

please tell me how this example does *not* show that the two forms have
different semantics (and then, how do they have the same parse trees?).

> In fact, the last sentence
> in section 10.1.2 strongly suggests to me that the parse tree stores
> all function calls as if prefix notation was used.  But it is probably
> my English again.....
>   

please see above.

>   
>>> Thus, I guess you want to start a discussion with R Core whether it
>>> is worthwhile to change the parser such that it keeps track on
>>> whether a function was used with infix notation or prefix notation
>>> and to provide for most (all?) assignment operators implementations
>>> that use destructive semantics if the infix version was used and
>>> always copy if the prefix notation is used. 
>>>   
>>>       
>> as i explained a few months ago, i study r to find examples of bad
>> design.  if anyone in the r core is interested in having the problems
>> i report fixed, 
>>     
>
> Well, whether something is bad design and/or is a problem is in the eye
> of the beholder.
>   

i have never claimed that what i call bad design is what r developers
call bad design.  more and more, it appears that we disagree.  i just
collect and point out what i think is bad design.  others obviously have
their take.

cheers,
vQ