[R] replaceMethod time and memory for very large object.

Henrik Bengtsson hb at maths.lth.se
Fri May 23 18:06:54 CEST 2003


Hi Laurent, this is exactly the problem I had to when I was started to
work on microarray data. Your strategy works and it does indeed improve
the memory and time efficiency quite a bit. It is just a matter on what
granuality you want to emulate references, i.e. a matrix, a column of a
matrix or a single cell. I have stayed with a matrix and when I update
the matrix R (50000x20) in a quadruple of (R,G,Rb,Gb) it does help since
I do not have to pay the cost of having G, Rb and Gb coupled to the same
data structure.

FYI: Since 2001, I have developed the R.oo package
(http://www.maths.lth.se/help/R/R.classes/) based a similar idea to what
you are suggesting, i.e. use environments or similar functionalities to
emulate pointers and provide it in a reusable way. It implements some
extra features too, however not necessary in this context. Note also
that R.oo is more in the spirit of "a method belongs to a class" and not
"a method belongs to a generic function", which is the idea of R, but it
is not a restriction. At this moment R.oo is based on S4, but I intend
to upgrade to S4. My microarray package com.braju.sma is then making use
of R.oo wherever microarray structures are defined.

Best wishes

Henrik Bengtsson
Lund University

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of laurent buffat
> Sent: den 23 maj 2003 16:01
> To: r-help at stat.math.ethz.ch
> Subject: [R] replaceMethod time and memory for very large object.
> 
> 
> 
> Hi there,
> 
> First, please apologize, I'm not fluent in English.
> 
> I try to manipulate very large object with R, and I have some 
> problems with memory and time access, because of the < by 
> value mechanism >. I would like to < encapsulate > a large 
> vector in a class and access to the vector by method and 
> replaceMethod, but where is a lot of < implicit copy >, and 
> so, a lot of memory and time consuming.
> 
> The data are very large, and come from micro array experiment 
> (see http://Biocondutor.org for more detail of what is a 
> micro array ) , but a  typical > vector is a 20000 genes * 20 
> probes * 100 experiments * 2 (means and variance)
> 
> The best way, in term of speed and memory is to try to 
> emulate a < by reference > mechanism, but it's not very < in 
> the spirit of R > and a little < dangerous > (see the example).
> 
> Could you give me some recommendations ?
> 
> Thanks for your help.
> 
> The code below is a little < long >, sorry.
> 
> Laurent B.
> 
> ////////////////////////////
> 
> setClass("Foo", representation(v = "numeric"))
> 
> setMethod("initialize", signature("Foo"), function(.Object, 
> v=vector()) {
> 		.Object at v <- v
> 		.Object
> 	   })
> 
> 
> setGeneric("v", function(.Object) standardGeneric("v")) 
> setMethod("v", "Foo", function(.Object) .Object at v )
> 
> setGeneric("v<-",function(.Object,value) 
> standardGeneric("v<-")) setReplaceMethod("v", "Foo", 
> function(.Object, value) {
> 	.Object at v <- value
>          return(.Object)
>          })
> 
> setMethod("[","Foo", function(x,i,j=NA,...,drop=FALSE) x at v[i] )
> 
> setReplaceMethod("[","Foo",function(x,i,j=NA,...,value) {
> 	x at v[i] <- value
> 	x
> 	})
> 
> n <- 2000 * 20 * 100 * 2
> 
> # in fact I would like to have
> # 20000 genes * 20 mesures by genes (probes) * 100 
> experiences * 2 ( mean and variance) # but, it's to much 
> memory for these example, so just try with 2000 "genes".
> 
> x <- rep(1,n)
> # x, a non encapsuled vetor for the data "
> y <- new("Foo",v=x)
> # y, a encapsuled version".
> 
> 
> x[1] <- 2
> y at v[1] <- 2
> v(y)[1] <- 2
> y[1] <- 2
> 
> nt <- 10 # number of test
> 
> system.time(for(i in 1:nt) x[1] <- 2)
> system.time(for(i in 1:nt) y at v[1] <- 2)
> system.time(for(i in 1:nt) v(y)[1] <- 2)
> system.time(for(i in 1:nt) y[1] <- 2)
> 
> [1] 0 0 0 0 0
> [1]  7.80  3.17 10.97  0.00  0.00
> [1] 10.19  5.39 15.60  0.00  0.00
> [1]  9.00  4.54 13.55  0.00  0.00
> 
> x[1:2]
> y[1:2]
> v(y)[1:2]
> y at v[1:2]
> 
> system.time(for(i in 1:nt) x[1:2])
> system.time(for(i in 1:nt) y[1:2])
> system.time(for(i in 1:nt) v(y)[1:2])
> system.time(for(i in 1:nt) y at v[1:2])
> 
> 
> [1] 0 0 0 0 0
> [1] 0 0 0 0 0
> [1] 0 0 0 0 0
> [1] 0 0 0 0 0
> 
> # no problem for "acces method, only for replace method
> # Class FooPtr,
> # a way to try to by pass the "by value mecanizim of R" ...
> 
> setClass("FooPtr", representation(p = "environment"))
> 
> setMethod("initialize", signature("FooPtr"), 
> function(.Object, v=vector()) {
> 		.Object at p <- new("environment")
> 		assign("v",v,envir=.Object at p)
> 		.Object
> 	   })
> 
> setMethod("v", "FooPtr", function(.Object) get("v",envir=.Object at p) )
> 
> setReplaceMethod("v", "FooPtr",
>                    function(.Object, value) {
>                    assign("v",value,envir=.Object at p)
>                    return(.Object)
>                  })
> 
> setMethod("[","FooPtr", function(x,i,j=NA,...,drop=FALSE) 
> get("v",envir=x at p)[i] )
> 
> # a first version of "[<-" for FooPtr :
> 
> setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value)
> 	{
> 	v<- get("v",envir=x at p)
> 	v[i] <- value
> 	assign("v",v,envir=x at p)
> 	x
> 	})
> 
> z <- new("FooPtr",v=x)
> 
> x[1] <- 2
> v(z)[1] <- 2
> z[1] <- 2
> 
> 
> system.time(for(i in 1:nt) x[1] <- 2)
> system.time(for(i in 1:nt) v(z)[1] <- 2)
> system.time(for(i in 1:nt) z[1] <- 2)
> 
> [1] 0.01 0.00 0.01 0.00 0.00
> [1] 0 0 0 0 0
> [1] 1.63 1.18 2.81 0.00 0.00
> 
> # the v(z)[1] is "good", but not "[<-"
> # a more creasy way to try "by reference"
> 
> setReplaceMethod("[","FooPtr",function(x,i,j=NA,...,value)
> 	{
> 	assign("i",i,envir=x at p)
> 	assign("value",value,envir=x at p)
> 	eval(expression(v[i] <- value), envir=x at p)
> 	rm("i","value",envir=x at p)
> 	x
> 	})
> 
> system.time(for(i in 1:nt) x[1] <- 2)
> system.time(for(i in 1:nt) v(z)[1] <- 2)
> system.time(for(i in 1:nt) z[1] <- 2)
> 
> [1] 0 0 0 0 0
> [1] 0 0 0 0 0
> [1] 0.14 0.12 0.26 0.00 0.00
> 
> # "[<-" is better, but v(z)[] is the best ... (why ???)
> 
> 
> # ok, v(z)[i] is the "best" acess, but you need to know what you do :
> 
> v(z)[1] <- 12345
> z1 <- z
> v(z1)[1]
> 
> # z and z1 work with the same environment ...
> 
> //////////////////////
> 
> Thanks for your help.
> 
> Laurent
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
> 
>




More information about the R-help mailing list