[Rd] surprising behaviour of names<-

Fri Mar 13 05:03:09 CET 2009

On Thu, 12 Mar 2009 21:26:15 +0100
Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:

> > YMMV, but when I read a passage like this in R documentation, I
> > start to wonder why it is stated that 
> > 	names(x) <- c("a","b")
> > is equivalent to 
> > 	*tmp* <- x
> > 	x <- "names<-"('*tmp*', value=c("a","b"))
> > and the simpler construct
> > 	x <- "names<-"(x, value=c("a", "b"))
> > is not used.  There must be a reason, 
> 
> got an explanation:  because it probably is as drafty as the
> aforementioned document.

Your grasp of what "draft manual" means in the context of R
documentation seems to be as tenuous as the grasp of intelligent
design/creationist proponents on what it means in science to label a
body of knowledge a "(scientific) theory". :)

[...]
> but it is possible to send an argument to a function that makes an
> assignment to the argument, and yet the assignment is made to the
> original, not to a copy:
> 
>     foo = function(arg) arg$foo = foo
> 
>     e = new.env()
>     foo(e)
>     e$foo
>       
> are you sure this is pass by value?

But that is what environments are for, aren't they?  And it is
documented behaviour.  Read section 2.1.10 ("Environments") in the R
Language Definition, in particular the last paragraph:

  Unlike most other R objects, environments are not copied when 
  passed to functions or used in assignments.  Thus, if you assign the
  same environment to several symbols and change one, the others will
  change too.  In particular, assigning attributes to an environment can
  lead to surprises.

[..]
> and actually, in the example we discuss, 'names<-' does *not* return
> an updated *tmp*, so there's even less to entertain.  

How do you know?  Are you sure?  Have you by now studied what goes on
under the hood?

> for fun and more guesswork, the example could have been:
> 
>     x = x
>     x = 'names<-'(x, value=c('a', 'b'))

But it is manifestly not written that way in the manual; and for good
reasons since 'names<-' might have side effects which invokes in the
last line undefined behaviour.  Just as in the equivalent C snippet
that I mentioned.

> for your interest in well written documentation, ?names says that the
> argument x is 'an r object', and nowhere does it say that environment
> is not an r object.  it also says what the value of 'names<-' applied
> to pairlists is.  the following error message is doubly surprising:
> 
>     e = new.env()
>     'names<-'(e, 'foo')
>     # Error: names() applied to a non-vector

But names are implemented by assigning a "name" attribute to the
object; as you should know.  And the above documentation suggests that
it is not a good idea to assign attributed to environments.  So why
would you expect this to work?

> firstly, because it would seem that there's nothing wrong in applying
> names to an environment;  from ?'$':
> 
> "
>     x$name
> 
>     name: A literal character string or a name (possibly backtick
>           quoted).  For extraction, this is normally (see under
>           'Environments') partially matched to the 'names' of the
>           object.
> "

I fail to see the relevance of this.

> secondly, because, as ?names says, names can be applied to pairlists,

Yes, but it does not say that names can be applied to environment.
And it explicitly says that the "default methods get and set the
'"name"' attribute of..." and (other) documentation warns you about
setting attributes on environments.

> which are not vectors, and the following does not give an error as
> above:
> 
>     p = pairlist()
>     is.vector(p)
>     # FALSE
>     names(p)
>     # names successfully applied to a non-vector
>    
> assure me this is not a mess, but a well-documented design feature.

It is documented, if it is well-documented depends on your definition
of "well-documented". :)

> ... and one wonders why r man pages have to be read in O(e^n) time.

I believe patches to documentation are also welcome; and perhaps more
readily accepted than patches to code. 

[...]  
> >>> I guess that would require a rewrite (or extension) of the parser.
> >>> To me, Section 10.1.2 of the Language Definition manual suggests
> >>> that once an expression is parsed, you cannot distinguish any more
> >>> whether 'names<-' was called using infix syntax or prefix syntax.
> >>>   
> >>>       
> >> but this must be nonsense, since:
> >>
> >>     x = 1
> >>     'names<-'(x, 'foo')
> >>     names(x)
> >>     # NULL
> >>
> >>     x = 1
> >>     names(x) <- 'foo'
> >>     names(x)
> >>     # "foo"
> >>
> >> clearly, there is not only syntactic difference here.  but it
> >> might be that 10.1.2 does not suggest anything like what you say.
> >>     
> >
> > Please tell me how this example contradicts my reading of 10.1.2
> > that the expressions 
> > 	'names<-'(x, 'foo')
> > and
> > 	names(x) <- 'foo'
> > once they are parsed, produce exactly the same parse tree and that
> > it becomes impossible to tell from the parse tree whether
> > originally the infix syntax or the prefix syntax was used.  
> 
> because if they produced the same parse tree, you would either have to
> have the same result in both cases (because the same parse tree is
> interpreted), [...]

Sorry, looks as if I was too fast (again). 

'names<-'(x,'foo') should create (more or less) a parse tree equivalent
to that expression and then return the value of the call to
'names<-' (as it does).  I said "more or less" because some temporary
variables might be created whose named field is set to 1 so that "some
primitive functions can be optimized to avoid a copy" ("R Internals",
pages 3/4).

names(x) <- 'foo' should create (more or less) a parse tree equivalent
to " '<-'(x, 'names'<-(x,'foo')) ".  I say "more or less" for similar
reasons as above.

My point is that when the evaluator works through the parse tree and
comes to the 'names'<-(c, 'foo') part, it cannot tell (without
analysing what was before in the parse tree and what comes after; and
this analysis might be difficult if temporary variables are created)
whether the user used the prefix syntax or the infix syntax.

I have no idea whether this can easily be changed and whether it is
worthwhile to do such a change.  As I said, you will have to take this
up with R Core.

Cheers,

	Berwin