[R] Object and file sizes
murdoch@dunc@n @end|ng |rom gm@||@com
Fri Jun 28 15:26:47 CEST 2019
On 28/06/2019 7:35 a.m., Göran Broström wrote:
> I have two large data frames, 'liss' (170 million obs, 8 variables) and
> 'fobb' (52 million obs, 8 variables, same as for 'liss'), and checking
> their sizes I get
> > object.size(liss)
> 7477492552 bytes
> > object.size(fobb)
> 2494591736 bytes
> Fair enough, but when I save them to disk (saveRDS), the size relation
> is reversed: 'fobb.rds' takes up 273 MB while 'liss.rds' uses 146 MB!
> I was puzzled by this and thought that I had made a mistake in creating
> them, but the only explanation I can find for this is that 'liss'
> contains a lot more missing values.
saveRDS() uses compression by default. Compression works best if there
are a lot of repetitive values; every NA is the same, so that would help
compression. Other values may also be repeated.
If you use saveRDS(compress=FALSE), you'll get much larger results,
probably roughly proportional to the object.size() results.
More information about the R-help