[BioC] Problem running summarizeOverlaps()

Martin Morgan mtmorgan at fhcrc.org
Tue May 20 23:25:23 CEST 2014


On 05/20/2014 01:37 PM, Jessica Perry Hekman wrote:
> On 05/20/2014 02:20 PM, Jessica Perry Hekman wrote:
>>>> Error: C stack usage is too close to the limit
>
>>> You might then try adding a 'yieldSize' argument to the following line,
>>> starting small (e.g., 100000) and moving toward the default (1000000) if
>>> the small size works when calling summarizeOverlaps, or perhaps smaller
>>> if it fails.
>>>
>>>    bamfls <- BamFileList(fls, yieldSize=100000)
>
> So, this is perplexing. Is 1000000 really the default? Because I can set
> yieldSize to much larger OR smaller than that and the command will succeed (or
> at least complete without errors). But when I do not specify yieldSize at all,
> there is an error!

Ok, I guess I did not remember correctly. If the function is passed a BamFile / 
BamFileList, it respects the yieldSize in the File / List. If yieldSize is not 
specified, then it'll try to read the entire file into memory. And hilarity 
ensues. If passed a character vector of file paths (I think this is supported in 
your version) then summarizeOverlaps will set the default yieldSize to 1000000.

So yes, create the BamFileList with an appropriate yieldSize. From your earlier 
email, yieldSize refers to the number of reads read in at one time.

In terms of appropriate yieldSize, summarizeOverlaps will iterate through 
individual bam files using yieldSize, and simultaneously use parallel (hence for 
you mc.cores, which by default is just 2 but can be set using 
options(mc.cores=8) or whatever; more recent versions use BiocParallel and 
register(MulticoreParam())) evaluation to process several bam files at once. So 
for optimal performance you want to choose a yieldSize such that all cores (or 
as many as being neighbourly dictates) are in use but not too much memory is 
being consumed.

If you do decide to update your R, summarizeOverlaps has moved to GenomicAlignments.

Martin

>
>  > bamfls <- BamFileList(fls, yieldSize=100000)
>  > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
> +          ignore.strand=TRUE, single.end=TRUE, param=param)
>  > bamfls <- BamFileList(fls, yieldSize=500000)
>  > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
> +          ignore.strand=TRUE, single.end=TRUE, param=param)
>  > bamfls <- BamFileList(fls, yieldSize=1000000)
>  > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
> +          ignore.strand=TRUE, single.end=TRUE, param=param)
>  > bamfls <- BamFileList(fls, yieldSize=10000000)
>  > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
> +          ignore.strand=TRUE, single.end=TRUE, param=param)
>  > bamfls <- BamFileList(fls, yieldSize=1000000000)
>  > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
> +          ignore.strand=TRUE, single.end=TRUE, param=param)
>
>
> BUT:
>
>  > bamfls <- BamFileList(fls)
>  > gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
> +          ignore.strand=TRUE, single.end=TRUE, param=param)
> Error: C stack usage is too close to the limit
>
> ?!
>
> Jessica


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list