[R] Do environments make copies?

Berton Gunter gunter.berton at gene.com
Thu Feb 24 22:24:47 CET 2005


I was hoping that one of the R gurus would reply to this, but as they have't
(thus far) I'll try. Caveat emptor!

First of all, R passes function arguments by values, so as soon as you call
foo(val) you are already making (at least) one other copy of val for the
call.

Second,you seem to implicitly make the assumption that assign(..., env=)
uses a pointer to point to the values in the environment. I do not know how
R handles environments and assignments like this internally, but your data
seems to indicate that it copies the value and does not merely point to it
(this is where R Core folks can shed more authoritative light). 

Finally, it makes perfect sense to me that, as a data structure, the
environment itself may be small even if it effectively points to (one of
several copies of) large objects, so that object.size(an.environment) could
be small although the environment may "contain" huge arguments. Again, the
details depend on the precise implementation and need clarification by
someone who actually knows what's going on here, which ain't me.

I think the important message is that you shouldn't treat R as C, and you
shouldn't try to circumvent R's internal data structures and conventions. R
is a language designed to implements Chambers's S model of "Programming with
Data." Instead of trying to fool R to handle large data sets, maybe you
should consider whether you really **need** all the data in R at one time
and if sensible partitioning or sampling to analyze only a portion or
portions of the data might not be a more effective strategy.


-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
 
 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Nawaaz Ahmed
> Sent: Thursday, February 24, 2005 10:36 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Do environments make copies?
> 
> I am using environments to avoid making copies (by keeping 
> references). 
> But it seems like there is a hidden copy going on somewhere - for
> example in the code fragment below, I am creating a reference to "y"
> (of size 500MB) and storing the reference in object "data". 
> But when I 
> save "data" and then restore it in another R session, gc() 
> claims it is 
> using twice the amount of memory. Where/How is this happening?
> 
> Thanks for any help in working around this - my datasets are just not 
> fitting into my 4GB, 32 bit linux machine (even though my actual data 
> size is around 800MB)
> 
> Nawaaz
> 
>  > new.ref <- function(value = NULL) {
> +     ref <- list(env = new.env())
> +     class(ref) <- "refObject"
> +     assign("value", value, env = ref$env)
> +     ref
> + }
>  > object.size(y)
> [1] 587941404
>  > y.ref = new.ref(y)
>  > object.size(y.ref)
> [1] 328
>  > data = list()
>  > data$y.ref = y.ref
>  > object.size(data)
> [1] 492
>  > save(data, "data.RData")
> 
> ...
> 
> run R again
> ===========
> 
>  > load("data.RData")
>  > gc()
>              used   (Mb) gc trigger   (Mb)
> Ncells    141051    3.8     350000    9.4
> Vcells 147037925 1121.9  147390241 1124.5
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list