[BioC] Curious file size issues

Adaikalavan Ramasamy a.ramasamy at imperial.ac.uk
Thu Mar 12 11:39:59 CET 2009

I am not an expert in R data representations. However, my experience 
suggests that if an object is stored incorrectly as matrix instead of 
data.frame, then the object sizes may be bloated. Also if it is a 
data.frame, check that each column is stored correctly - via 
matrix(obj). E.g. storing numeric columns as factors or characters etc.

Also use the compress=TRUE option in the save().

Regards, Adai

Daniel Brewer wrote:
> Hello,
> The GTF file from Ensembl for the human genome,
> Homo_sapiens.NCBI36.52.gtf, is 194M and is a tab-delimted text file.  I
> import it into R and process it so that there are two objects:
> genomeRanges & genomeBlocks.  genomeRanges is a list of IRanges objects,
> each of which is a particular chromosome and strand.  genomeBlocks is a
> list of dataframes with the associated annotation for each of the
> transcripts.
> When I save this to file
> (save(genomeBlocks,genomeRanges,file="Hsgenome.Rdata")) it comes out as
> 859M.  How is this possible? Especially as the Rdata file is a binary
> format.
>> object.size(genomeBlocks)
> [1] 2939935864
>> object.size(genomeRanges)
> [1] 8769208
> Anyway got any ideas what is going on?
> Thanks
> Dan

More information about the Bioconductor mailing list