[R] Large file size while persisting rpart model to disk

Duncan Murdoch murdoch at stats.uwo.ca
Wed Feb 4 21:23:34 CET 2009

On 2/4/2009 2:27 PM, Terry Therneau wrote:
> Lots of interesting comments while I was off in meetings.  (Some days I wonder 
> why they pay me - with so many meetings I certainly don't accomplish any work.)
>  Some responses:
>  1. To Brian: I think that there is another issue outside of save(). Use the 
> frailty.gamma function as a thought example.  It's about 3 pages long with lots 
> and lots of temporary variables and computations, at the end of which it returns 
> an X matrix of data and a stack of attributes.  One of these is a print 
> function.  Some of the temp objects can be really large, large enough that 
> memory recovery may be important.  Does not the reference of these in an 
> environment prevent R from reclaiming that memory during the session?
>  2. Duncan: You objected to my phrase
>   mfun <- function(x) { x+y}
> will look for 'y' in the function that called myfun, then in the function that
> called the function, .... on up and then through the search() list.  This makes 
> life easier for certain things such as minimizers.
>   I was writing for ordinary mortals, reading code.  The distinction you raise 
> between the code and the "current instance of memory objects when the code was 
> being executed" is opaque to many.   At least its tricky for me.  

I might be using too much jargon, but there's an important distinction 
between the caller of a function, and its creator.  It's the creator's 
variables that get caught.

For example:

   buildF <- function() {
      x <- "x in buildF"
      f <- function() print(x)

Here the function buildF() is the creator, and f will see the value 
"buildF" in it, even if I call it from somewhere else:

   useF <- function() {
      x <- "x in useF"
      f <- buildF()

Here the caller of f is useF(), but you won't see "x in useF" being 
printed, you'll see:

 > useF()
[1] "x in buildF"

>  3. On removing variables: I don't like that idea, and think it is much much 
> clearer to exlicitly refer to what you do want than to remove what you don't.  I 
> never liked the m$x <- m$y <- m$whozit....... <- NULL construct for that reason, 
> which was once found in most of the modeling functions.

Then I'd suggest that you do what Luke said, except...

>  4. Luke: I've read your code suggestion thrice now, and I understand what you 
> are doing less on each pass.  

>  Now, two questions for the pros
>  a. I like Brian's suggestion of using asNamespace('survival'), other than the 
> help page that expliclty states that I should never ever call said function.  If 
> I don't use any non-exported-from-the-package functions, it seems that 
> globalenv() is the most clear construct, however.
>    How do I know what gets saved and what doesn't?  We don't want the all the 
> survival functions to be saved on disk with my object, like local variables 
> would be.

Variables in local frames will be saved, things in the global 
environment, or in package namespaces, won't be.

>   b. Is there any difference or preference for
>   	environment(printfun) <- asNamespace('survival')
>   	environment(printfun) <- new.env(parent= asNamespace('surivival'))

The first is slightly more efficient, but there's hardly any difference. 
  When printfun goes looking for something that's not local, in the 
second case it'll search an empty environment first, then go to the 
namespace.  In the first case it'll go direct to the namespace.

Duncan Murdoch

>   Terry T.

More information about the R-help mailing list