[BioC] Curious file size issues

Daniel Brewer daniel.brewer at icr.ac.uk
Thu Mar 12 16:56:47 CET 2009

Thats great, thank you so much.  There was a particular variable that
had long strings that was being treated as a factor which caused the
problems.  It is now down to 13M without compression.  That's more like it.



Adaikalavan Ramasamy wrote:
> I am not an expert in R data representations. However, my experience
> suggests that if an object is stored incorrectly as matrix instead of
> data.frame, then the object sizes may be bloated. Also if it is a
> data.frame, check that each column is stored correctly - via
> matrix(obj). E.g. storing numeric columns as factors or characters etc.
> Also use the compress=TRUE option in the save().
> Regards, Adai
> Daniel Brewer wrote:
>> Hello,
>> The GTF file from Ensembl for the human genome,
>> Homo_sapiens.NCBI36.52.gtf, is 194M and is a tab-delimted text file.  I
>> import it into R and process it so that there are two objects:
>> genomeRanges & genomeBlocks.  genomeRanges is a list of IRanges objects,
>> each of which is a particular chromosome and strand.  genomeBlocks is a
>> list of dataframes with the associated annotation for each of the
>> transcripts.
>> When I save this to file
>> (save(genomeBlocks,genomeRanges,file="Hsgenome.Rdata")) it comes out as
>> 859M.  How is this possible? Especially as the Rdata file is a binary
>> format.
>>> object.size(genomeBlocks)
>> [1] 2939935864
>>> object.size(genomeRanges)
>> [1] 8769208
>> Anyway got any ideas what is going on?
>> Thanks
>> Dan

Daniel Brewer, Ph.D.

Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:2}}

More information about the Bioconductor mailing list