[Rd] modifying large R objects in place

Peter Dalgaard p.dalgaard at biostat.ku.dk
Fri Sep 28 00:39:30 CEST 2007


Petr Savicky wrote:
> Thank you very much for all the explanations. In particular for pointing
> out that nrow is not a .Primitive unlike dim, which is the
> reason for the difference in their behavior. (I rised the question
> of possible bug due to this difference, not just being unsatisfied
> with nrow). Also, thanks for:
>
> On Thu, Sep 27, 2007 at 05:59:05PM +0100, Prof Brian Ripley wrote:
> [...]
>   
>> 2) I expected NAMED on 'a' to be incremented by nrow(a): here is my 
>> understanding.
>>
>> When you called nrow(a) you created another reference to 'a' in the 
>> evaluation frame of nrow.  (At a finer level you first created a promise 
>> to 'a' and then dim(x) evaluated that promise, which did SET_NAMED(<SEXP>) 
>> = 2.)  So NAMED(a) was correctly bumped to 2, and it is never reduced.
>>
>> More generally, any argument to a closure that actually gets used will 
>> get NAMED set to 2.
>>     
> [...]
>
> This explains a lot.
>
> I appreciate also the patch to matrix by Henrik Bengtsson, which saved
> me time formulating a further question just about this.
>
> I do not know, whether there is a reason to keep nrow, ncol not .Primitive,
> but if there is such, the problem may be solved by rewriting
> them as follows:
>
> nrow <- function(...) dim(...)[1]
> ncol <- function(...) dim(...)[2]
>
> At least in my environment, the new versions preserved NAMED == 1.
>   
Yes, but changing the formal arguments is a bit messy, is it not?

Presumably, nrow <- function(x) eval.parent(substitute(dim(x)[1])) works 
too, but if the gain is important enough to warrant that sort of 
programming, you might as well make nrow a .Primitive.

Longer-term, I still have some hope for better reference counting, but 
the semantics of environments make it really ugly -- an environment can 
contain an object that contains the environment, a simple example being 

f <- function()
    g <- function() 0
f()

At the end of f(), we should decide whether to destroy f's evaluation 
environment. In the present example, what we need to be able to see is 
that this would remove all refences to g and that the reference from g 
to f can therefore be ignored.  Complete logic for sorting this out is 
basically equivalent to a new garbage collector, and one can suspect 
that applying the logic upon every function return is going to be 
terribly inefficient. However, partial heuristics might apply.

> It has a side effect that this unifies the error messages generated
> by too many arguments to nrow(x) and dim(x). Currently
>   a <- matrix(1:6,nrow=2)
>   nrow(a,a) # Error in nrow(a, a) : unused argument(s) (1:6)
>   dim(a,a) # Error: 2 arguments passed to 'dim' which requires 1
>
> May be, also other solutions exist.
>
> Petr Savicky.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>   


-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-devel mailing list