[R] Large file size while persisting rpart model to disk

tan tanmaykm at gmail.com
Tue Feb 3 11:24:29 CET 2009


I am using rpart to build a model for later predictions. To save the
prediction across restarts and share the data across nodes I have been
using "save" to persist the result of rpart to a file and "load" it
later. But the saved size was becoming unusually large (even with
binary, compressed mode). The size was also proportional to the amount
of data that was used to create the model.

After tinkering a bit, I figured out that most of the size was because
of the rpart$functions attribute. If I set it to NULL, the size seems
to drop dramatically. It can be seen with the following lines of R
code, where there is a difference, though it is small. The difference
is more pronounced with large datasets.

library(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
save(fit, file="fit1.sav")
fit$functions <- NULL
save(fit, file="fit2.sav")

What is the reason behind it? The functions themselves seem small, so
where it all the bulk coming from?

Thanks,
Tan




More information about the R-help mailing list