[R] Large file size while persisting rpart model to disk

Terry Therneau therneau at mayo.edu
Wed Feb 4 20:27:40 CET 2009


Lots of interesting comments while I was off in meetings.  (Some days I wonder 
why they pay me - with so many meetings I certainly don't accomplish any work.)

 Some responses:
 
 1. To Brian: I think that there is another issue outside of save(). Use the 
frailty.gamma function as a thought example.  It's about 3 pages long with lots 
and lots of temporary variables and computations, at the end of which it returns 
an X matrix of data and a stack of attributes.  One of these is a print 
function.  Some of the temp objects can be really large, large enough that 
memory recovery may be important.  Does not the reference of these in an 
environment prevent R from reclaiming that memory during the session?
  
 2. Duncan: You objected to my phrase
  mfun <- function(x) { x+y}
will look for 'y' in the function that called myfun, then in the function that
called the function, .... on up and then through the search() list.  This makes 
life easier for certain things such as minimizers.

  I was writing for ordinary mortals, reading code.  The distinction you raise 
between the code and the "current instance of memory objects when the code was 
being executed" is opaque to many.   At least its tricky for me.  
  
 3. On removing variables: I don't like that idea, and think it is much much 
clearer to exlicitly refer to what you do want than to remove what you don't.  I 
never liked the m$x <- m$y <- m$whozit....... <- NULL construct for that reason, 
which was once found in most of the modeling functions.
  
 4. Luke: I've read your code suggestion thrice now, and I understand what you 
are doing less on each pass.  
 
 Now, two questions for the pros
 
 a. I like Brian's suggestion of using asNamespace('survival'), other than the 
help page that expliclty states that I should never ever call said function.  If 
I don't use any non-exported-from-the-package functions, it seems that 
globalenv() is the most clear construct, however.
   How do I know what gets saved and what doesn't?  We don't want the all the 
survival functions to be saved on disk with my object, like local variables 
would be.  
   
  b. Is there any difference or preference for
  	environment(printfun) <- asNamespace('survival')
  	environment(printfun) <- new.env(parent= asNamespace('surivival'))
 
  Terry T.




More information about the R-help mailing list