[Rd] surprising behaviour of names<-

Berwin A Turlach berwin at maths.uwa.edu.au
Sat Mar 14 05:20:34 CET 2009


On Fri, 13 Mar 2009 19:41:42 +0100
Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:

> > Glad to see that we agree on this.
> >   
> 
> owe you a beer.

O.k., if we ever meet is is first your shout and then mine.
 
> >> haven't objected to that.  i object to your 'r uses pass by value',
> >> which is only partially correct.
> >>     
> >
> > Well, I used qualifiers and did not stated it categorically. 
> >   
> 
> indeed, you said "R supposedly uses call-by-value (though we know how
> to circumvent that, don't we?)".
> 
> in that vain, R supposedly can be used to do valid statistical
> computations (though we know how to circumvent it) ;)

Sure, use Excel? ;-)
 
> > Indeed, if you type these two commands on the command line, then it
> > is not surprising that a copy of tmp is returned since you create a
> > temporary object that ends up in the symbol table and persist after
> > the commands are finished.
> >   
> 
> what does command line have to do with it?

If you want to find out what goes on under the hood, it is not
necessarily sufficient to do the same calculations on the command line.
 
> > Obviously, assuming that R really executes 
> > 	*tmp* <- x
> > 	x <- "names<-"('*tmp*', value=c("a","b"))
> > under the hood, in the C code, then *tmp* does not end up in the
> > symbol table 
> 
> no?

Well, I don't see any new object created in my workspace after
	x <- 4
	names(x) <- "foo"
Do you?

> i guess you have looked under the hood;  point me to the relevant
> code.

No I did not, because I am not interested in knowing such intimate
details of R, but it seems you were interested.
 
> yes, *if* you are able to predict the refcount of the object passed to
> 'names<-' *then* you can predict what 'names<-' will do, [...] 

I think Simon pointed already out that you seem to have a wrong
picture of what is going on.  As far as I know, there is no refcount
for objects.  

The relevant documentation would be R Language Manual, 1.1 SEXPs:

  What R users think of as variables or objects are symbols which are
  bound to a value. The value can be thought of as either a SEXP (a
  pointer), or the structure it points to, a SEXPREC (and there are
  alternative forms used for vectors, namely VECSXP pointing to
  VECTOR_SEXPREC structures).

and 1.1.2 Rest of header:

  The named field is set and accessed by the SET_NAMED
  and NAMED macros, and take values 0, 1 and 2. R has a `call by value'
  illusion, so an assignment like

      b <- a

  appears to make a copy of a and refer to it as b. However, if neither
  a nor b are subsequently altered there is no need to copy. What really
  happens is that a new symbol b is bound to the same value as a and the
  named field on the value object is set (in this case to 2). When an
  object is about to be altered, the named field is consulted. A value
  of 2 means that the object must be duplicated before being changed.
  (Note that this does not say that it is necessary to duplicate, only
  that it should be duplicated whether necessary or not.) A value of 0
  means that it is known that no other SEXP shares data with this
  object, and so it may safely be altered. A value of 1 is used for
  situations like

      dim(a) <- c(7, 2)

  where in principle two copies of a exist for the duration of the
  computation as (in principle)

      a <- `dim<-`(a, c(7, 2))

  but for no longer, and so some primitive functions can be optimized to
  avoid a copy in this case. 

> but in general you may not have the chance. [...]

Agreed.

> and in general, this should not matter because it should be
> unobservable, but it isn't.

That's your opinion (to which you are entitled).  Unfortunately (for
you), the designers of R decided on a design which allows them to
reduce the number of copies that have to be made.

> >> you suggested that "One reads the manual, (...) one reflects and
> >> investigates, ..."
> >>     
> >
> > Indeed, and I am not giving up hope that one day you will master
> > this art.
> >   
> 
> well, this time i meant you.
 
Rest assure I have read and reflected on that part of the manual.  

And I guess it boils down to how you interpret what "is equivalent to"
means.

For me it means that those two commands are what is executed in the C
engine once the "names(x)<-c("a","b")" expression is parsed and the
parse list arrives at the interpreter.  To investigate whether that is
the case, one would have to look at the C code, and I have little
inclination to do so.  But that would be necessary to answer the
question whether *tmp* or a copy of *tmp* is returned, if one is really
interested in this question.  Or whether a *tmp* object is created at
all.

You seem to take "is equivalent to" to mean that issuing
"names(x)<-c("a","b")" on the command line has the same effect as
issuing those two other commands on the command line and addressing
whether *tmp* or a copy of *tmp* is returned in this case.  Fair
enough, but it addresses a different question.  And, as you said
yourself in another e-mail, on the command line these two versions are
not equivalent since one creates an additional object.


I was under the impression that you were interested to understand what
happens if you issue the commands
	names(x) <- "foo"
and
	"names<-"(x, "foo")
and I must agree with Simon, the answer by Peter was explaining it very
well to someone familiar with the documentation of R.  The fact that
you found that answer unsatisfactory suggests that you could improve
your familiarity with the documentation.  Simon's answers provided
already more details and I provided you with pointers to what I believe
to be relevant documentation.  It's now up to you whether, and how, you
want to digest this information/documentation.  And some questions are
not answered by the documentation and you will have to look into the
code to get the answers to those questions.  

The ultimate documentation is the source code which is freely available
(not sure whom I am paraphrasing here).

Best wishes,

	Berwin



More information about the R-devel mailing list