[R] Large file size while persisting rpart model to disk

tan tanmaykm at gmail.com
Tue Feb 3 15:13:39 CET 2009


Dear Prof. Ripley,

Thanks for the quick reply.

I do notice an <environment...> in the print output. I assume it is
used to keep copies of the initial data used for the model.

- Is it safe to assume that it would not affect any other
functionality, apart from the usage of those particular functions?

- Is there a better/recommended way of reducing the size?

Thanks,
Tan


On Feb 3, 4:56 pm, Prof Brian Ripley <rip... at stats.ox.ac.uk> wrote:
> On Tue, 3 Feb 2009, tan wrote:
> > I am using rpart to build a model for later predictions. To save the
> > prediction across restarts and share the data across nodes I have been
> > using "save" to persist the result of rpart to a file and "load" it
> > later. But the saved size was becoming unusually large (even with
> > binary, compressed mode). The size was also proportional to the amount
> > of data that was used to create the model.
>
> > After tinkering a bit, I figured out that most of the size was because
> > of the rpart$functions attribute. If I set it to NULL, the size seems
> > to drop dramatically. It can be seen with the following lines of R
> > code, where there is a difference, though it is small. The difference
> > is more pronounced with large datasets.
>
> > library(rpart)
> > fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
> > save(fit, file="fit1.sav")
> > fit$functions <- NULL
> > save(fit, file="fit2.sav")
>
> > What is the reason behind it? The functions themselves seem small, so
> > where it all the bulk coming from?
>
> Their environments.
>
> --
> Brian D. Ripley,                  rip... at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list