[BioC] assigning phenoData via assign

Martin Morgan mtmorgan at fhcrc.org
Mon May 7 18:10:02 CEST 2007


Hi Benilton --

I know it looks like there's copying going on, but this is an
_illusion_ that R very effectively maintains.

Here's what happens with 'direct' access (things in <> are memory
addresses of the objects; tracemem figures out when copies of the
memory address is made):

> tracemem(sample.ExpressionSet)
[1] "<0x178df98>"
> tracemem(sample.ExpressionSet at phenoData)
[1] "<0x20f3828>"
> phenoData(sample.ExpressionSet)=pd
tracemem[0x178df98 -> 0x2163840]: 
tracemem[0x2163840 -> 0x216a858]: phenoData<- phenoData<- 

and in the indirect route:

> tracemem(sample.ExpressionSet)
[1] "<0x216a858>"
> tracemem(sample.ExpressionSet at phenoData)
[1] "<0x1d86ed8>"
> tmp=get(objName)
> phenoData(tmp)=pd
tracemem[0x216a858 -> 0x1d86380]: 
tracemem[0x1d86380 -> 0x1d80af8]: phenoData<- phenoData<- 
> assign(objName, tmp)
> rm(tmp)

The surprising part is tmp=get(objName), which does NOT cause a memory
copy -- instead, the sample.ExpressionSet object has a flag associated
with it. Initially, the flag says 'there's only one reference to
me'. With 'get' or other non-modifying assignments, the flag is
incremented to say 'there's more than one reference to me'. The actual
modification (assigning to phenoData) triggers the copy, just as it did
in the direct way.

Using tracemem requires that your R is configured with
--enable-memory-profiling.

There is, actually, an inefficiency here, but it's in the original
version -- there's only one reference to sample.ExpressionSet, so
modifying it with phenoData(sample.ExpressionSet) = pd does not need
to make a copy of the whole instance, and assigning pd to the
phenoData slot of the new instance also does not need to make a
copy. This is hard to get around, though.

Benilton Carvalho <bcarvalh at jhsph.edu> writes:

> Hi everyone,
>
> I'm sorry if this was already discussed, as I did not succeed finding  
> anything relevant in the archives.
>
> I'm trying to set the phenoData slot using get() or something  
> similar, eg:
>
> library(Biobase)
> objName <- "sample.ExpressionSet"
> data(list=objName)
> pd <- phenoData(get(objName))
>
> get(objName)@phenoData <- pd    ## this will fail

As a matter of style, use phenoData(obj) <- pd rather than direct slot
access; you'll get whatever benefits are provided by phenoData<-, and
insulate your code from changes to underlying object structure.

> I could copy the object to a temporary one:
>
> tmp <- get(objName)
> tmp at phenoData <- pd
> assign(objName, tmp)
> rm(tmp)
>
> how can I avoid this copy?
>
> thanks a lot,
>
> b
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the Bioconductor mailing list