[Rd] saving objects with embedded environments

McGehee, Robert Robert.McGehee at geodecapital.com
Fri Jun 29 00:30:39 CEST 2007


Hello,
I have been running linear regressions on large data sets. As 'lm' saves
a great deal of extraneous (for me) data including the residuals,
fitted.values, model frame, etc., I generally set these to NULL within
the object before saving off the model to a file.

In the below example, however, I have found that depending on whether or
not I run 'lm' within another function or not, the entire function
environment is saved off with the file. So, even while object.size and
all.equal report that both 'lm's are equal and of small size, one saves
as a 24MB file and the other as 646 bytes. These seems to be because in
the first example the function environment is saved in attr(x1$terms,
".Environment") and takes up all 24MB of space.

Anyway, I think this is a bug, or if nothing else very undesirable (that
an object reported to be 0.5kb takes up 24MB). There also seems to be
some inconsistency on how environments are saved depending on if it is
the global environment or not, though I'm not familiar enough with
environments to know if this was intentional. Comments are appreciated.

Thanks,
Robert

##################################################################
testEq <- function(B) {
    x <- lm(y ~ x1+x2+x3, data=B, model=FALSE)
    x$residuals <- x$effects <- x$fitted.values <- x$qr$qr <- NULL
    x
}

N <- 900000
B <- data.frame(y=rnorm(N)+1:N, x1=rnorm(N)+1:N, x2=rnorm(N)+1:N,
x3=rnorm(N)+1:N)
x1 <- testEq(B)
x2 <- lm(y ~ x1+x2+x3, data=B, model=FALSE)
x2$residuals <- x2$effects <- x2$fitted.values <- x2$qr$qr <- NULL

all.equal(x1, x2) ## TRUE
object.size(x1)  ## 5112
object.size(x2)  ## 5112
save(x1, file="x1.RData")
save(x2, file="x2.RData")
file.info("x1.RData")$size ## 24063852 bytes
file.info("x2.RData")$size ## 646 bytes

> R.version
               _                           
platform       i686-pc-linux-gnu           
arch           i686                        
os             linux-gnu                   
system         i686, linux-gnu             
status                                     
major          2                           
minor          5.0                         
year           2007                        
month          04                          
day            23                          
svn rev        41293                       
language       R                           
version.string R version 2.5.0 (2007-04-23)


Robert McGehee, CFA
Quantitative Analyst
Geode Capital Management, LLC
One Post Office Square, 28th Floor | Boston, MA | 02109
Tel: 617/392-8396    Fax:617/476-6389
mailto:robert.mcgehee at geodecapital.com



This e-mail, and any attachments hereto, are intended for us...{{dropped}}



More information about the R-devel mailing list