[Rd] Large discrepancies in the same object being saved to .RData

Tony Plate taplate at gmail.com
Sun Jul 11 17:08:44 CEST 2010


Another way of seeing the environments referenced in an object is using 
str(), e.g.:

 > f1 <- function() {
+ junk <- rnorm(10000000)
+ x <- 1:3
+ y <- rnorm(3)
+ lm(y ~ x)
+ }
 > v1 <- f1()
 > object.size(f1)
1636 bytes
 > grep("Environment", capture.output(str(v1)), value=TRUE)
[1] "  .. ..- attr(*, \".Environment\")=<environment: 0x01f11a30> "
[2] "  .. .. ..- attr(*, \".Environment\")=<environment: 0x01f11a30> "
 >

-- Tony Plate

On 7/10/2010 10:10 PM, Bill.Venables at csiro.au wrote:
> Well, I have answered one of my questions below.  The hidden
> environment is attached to the 'terms' component of v1.
>
> To see this
>
>    
>> lapply(v1, environment)
>>      
> $coefficients
> NULL
>
> $residuals
> NULL
>
> $effects
> NULL
>
> $rank
> NULL
>
> $fitted.values
> NULL
>
> $assign
> NULL
>
> $qr
> NULL
>
> $df.residual
> NULL
>
> $xlevels
> NULL
>
> $call
> NULL
>
> $terms
> <environment: 0x021b9e18>
>
> $model
> NULL
>
>    
>> rm(junk, envir = with(v1, environment(terms)))
>> usedVcells()
>>      
> [1] 96532
>    
>>
>>      
> This is still a bit of a trap for young (and old!) players...
>
> I think the main point in my mind is why is it that object.size()
> excludes enclosing environments in its reckonings?
>
> Bill Venables.
>
> -----Original Message-----
> From: Venables, Bill (CMIS, Cleveland)
> Sent: Sunday, 11 July 2010 11:40 AM
> To: 'Duncan Murdoch'; 'Paul Johnson'
> Cc: 'r-devel at r-project.org'; Taylor, Julian (CMIS, Waite Campus)
> Subject: RE: [Rd] Large discrepancies in the same object being saved to .RData
>
> I'm still a bit puzzled by the original question.  I don't think it
> has much to do with .RData files and their sizes.  For me the puzzle
> comes much earlier.  Here is an example of what I mean using a little
> session
>
>    
>> usedVcells<- function() gc()["Vcells", "used"]
>> usedVcells()        ### the base load
>>      
> [1] 96345
>
> ### Now look at what happens when a function returns a formula as the
> ### value, with a big item floating around in the function closure:
>
>    
>> f0<- function() {
>>      
> + junk<- rnorm(10000000)
> + y ~ x
> + }
>    
>> v0<- f0()
>> usedVcells()   ### much bigger than base, why?
>>      
> [1] 10096355
>    
>> v0             ### no obvious envirnoment
>>      
> y ~ x
>    
>> object.size(v0)  ### so far, no clue given where
>>      
>                     ### the extra Vcells are located.
> 372 bytes
>
> ### Does v0 have an enclosing environment?
>
>    
>> environment(v0)             ### yep.
>>      
> <environment: 0x021cc538>
>    
>> ls(envir = environment(v0)) ### as expected, there's the junk
>>      
> [1] "junk"
>    
>> rm(junk, envir = environment(v0))  ### this does the trick.
>> usedVcells()
>>      
> [1] 96355
>
> ### Now consider a second example where the object
> ### is not a formula, but contains one.
>
>    
>> f1<- function() {
>>      
> + junk<- rnorm(10000000)
> + x<- 1:3
> + y<- rnorm(3)
> + lm(y ~ x)
> + }
>
>    
>> v1<- f1()
>> usedVcells()  ### as might have been expected.
>>      
> [1] 10096455
>
> ### in this case, though, there is no
> ### (obvious) enclosing environment
>
>    
>> environment(v1)
>>      
> NULL
>    
>> object.size(v1)  ### so where are the junk Vcells located?
>>      
> 7744 bytes
>    
>> ls(envir = environment(v1))  ### clearly wil not work
>>      
> Error in ls(envir = environment(v1)) : invalid 'envir' argument
>
>    
>> rm(v1)     ### removing the object does clear out the junk.
>> usedVcells()
>>      
> [1] 96366
>    
>>      
> And in this second case, as noted by Julian Taylor, if you save() the
> object the .RData file is also huge.  There is an environment attached
> to the object somewhere, but it appears to be occluded and entirely
> inaccessible.  (I have poked around the object components trying to
> find the thing but without success.)
>
> Have I missed something?
>
> Bill Venables.
>
> -----Original Message-----
> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch
> Sent: Sunday, 11 July 2010 10:36 AM
> To: Paul Johnson
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] Large discrepancies in the same object being saved to .RData
>
> On 10/07/2010 2:33 PM, Paul Johnson wrote:
>    
>> On Wed, Jul 7, 2010 at 7:12 AM, Duncan Murdoch<murdoch.duncan at gmail.com>  wrote:
>>
>>      
>>> On 06/07/2010 9:04 PM, Julian.Taylor at csiro.au wrote:
>>>
>>>        
>>>> Hi developers,
>>>>
>>>>
>>>>
>>>> After some investigation I have found there can be large discrepancies in
>>>> the same object being saved as an external "xx.RData" file. The immediate
>>>> repercussion of this is the possible increased size of your .RData workspace
>>>> for no apparent reason.
>>>>
>>>>
>>>>
>>>>
>>>>          
>>> I haven't worked through your example, but in general the way that local
>>> objects get captured is when part of the return value includes an
>>> environment.
>>>
>>>        
>> Hi, can I ask a follow up question?
>>
>> Is there a tool to browse *.Rdata files without loading them into R?
>>
>>      
> I don't know of one.  You can load the whole file into an empty
> environment, but then you lose information about "where did it come from"?
>
> Duncan Murdoch
>    
>> In HDF5 (a data storage format we use sometimes), there is a CLI
>> program "h5dump" that will spit out line-by-line all the contents of a
>> storage entity.  It will literally track through all the metadata, all
>> the vectors of scores, etc.  I've found that handy to "see what's
>> really  in there" in cases like the one that OP asked about.
>> Sometimes, we find that there are things that are "in there" by
>> mistake, as Duncan describes, and then we can try to figure why they
>> are in there.
>>
>> pj
>>
>>
>>
>>      
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>



More information about the R-devel mailing list