[BioC] memory exhausted for readAligned

Martin Morgan mtmorgan at fhcrc.org
Fri Feb 13 01:33:49 CET 2009


Hi Lana --

"Lana Schaffer" <schaffer at scripps.edu> writes:

> Hi,
> I am trying to read the alignment in a lane of Solexa data and ran out
> of memory.  
> I have 3.2G memory on my desktop computer.
> Is there a setting I can use to have enough memory for the readAligned
> command?
> How much memory do i need?

It depends on the number of reads, their length, ids, what data file
you're reading the reads from, etc. 7.5M 35mers take up ~985MB, as one
data point; the reads themselves are about 300MB. The implementation
of short read representation means that this data won't get
duplicated, so once in memory you should be ok.

If your desktop is a Windows box, I think you're probably severely
handicaped by memory constraints and will be frustrated during the
first steps of the analysis (usually the data collapse quite quickly,
e.g., after using 'coverage'). You can visit the R windows faq 

http://cran.r-project.org/bin/windows/base/rw-FAQ.html#There-seems-to-be-a-limit-on-the-memory-it-uses_0021

and 'Memory' help page for hints.

Depending on your data source and what you intend to do, you might be
able to read only some records (MAQ binary input), read just the
sequence and / or quality scores (e.g.,readFastq, readXStringColumns)
or read just the alignemnt information (e.g., read.table with
colClasses taking on NULL values to skip unwanted columns). Also you
might want to make sure that you're reading just the files you think
you are, e.g., a single lane, and not all files in a directory; the
paradigm for readAligned and other ShortRead functions is to read in
files that are the equivalent of list.files(dirPath, pattern).

Martin

> Lana Schaffer
> Biostatistics/Informatics
> The Scripps Research Institute
> DNA Array Core Facility
> La Jolla, CA 92037
> (858) 784-2263
> (858) 784-2994
> schaffer at scripps.edu 
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list