[BioC] readGAlignmentPairsFromBam with yieldSize & which
mtmorgan at fhcrc.org
Tue Jun 17 18:31:08 CEST 2014
On 06/17/2014 05:39 AM, Stefano Calza wrote:
> I am experiencing this weird (to me...) behaviour:
> bfl = BamFile(some/bam/file,yieldSize=100000L)
> ## 1
> ## 2
> Now 1) is much faster than 2). Indeed 1) takes ~ 6 secs while 2) at least a
> couple of minutes till I killed it...
> Am I doing something wrong or missing something here?
Hi Stefano --
Does the description on ?BamFile help?
yieldSize: Number of records to yield each time the file is read
from using 'scanBam' or, when 'length(bamWhich()) != 0', a
threshold which yields records in complete ranges whose sum
first exceeds 'yieldSize'.
you're getting the complete range chr22=IRanges(1,51304566), because that will
be the first complete range for which yieldSize is satsified.
The current implementation allows you to iterate through an entire file, or
iterate through many small ranges, but it does not really work well if the goal
is to subset the bam file (e.g., by chromsome) and iterate through the subset;
you would then filterBam and iterate through that.
It could also be that there is a bug in the code; it would then help to have the
output of sessionInfo() and a reproducible example, e.g., a subset of
some/bam/file or use of the files in the experiment data package
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives:
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor