[BioC] readGAlignmentPairsFromBam with yieldSize & which

Martin Morgan mtmorgan at fhcrc.org
Tue Jun 17 18:31:08 CEST 2014


On 06/17/2014 05:39 AM, Stefano Calza wrote:
> Hi!
>
> I am experiencing this weird (to me...) behaviour:
>
> bfl = BamFile(some/bam/file,yieldSize=100000L)
>
> param=ScanBamParam(which=RangesList(chr22=IRanges(1,51304566)))
>
> ## 1
> readGAlignmentsFromBam(bfl)
>
> ## 2
>
> readGAlignmentsFromBam(bfl,param=param)
>
>
> Now 1) is much faster than 2). Indeed 1) takes ~ 6 secs while 2) at least a
> couple of minutes till I killed it...
>
> Am I doing something wrong or missing something here?

Hi Stefano --

Does the description on ?BamFile help?

    yieldSize: Number of records to yield each time the file is read
           from using 'scanBam' or, when 'length(bamWhich()) != 0', a
           threshold which yields records in complete ranges whose sum
           first exceeds 'yieldSize'.

you're getting the complete range chr22=IRanges(1,51304566), because that will 
be the first complete range for which yieldSize is satsified.

The current implementation allows you to iterate through an entire file, or 
iterate through many small ranges, but it does not really work well if the goal 
is to subset the bam file (e.g., by chromsome) and iterate through the subset; 
you would then filterBam and iterate through that.

It could also be that there is a bug in the code; it would then help to have the 
output of sessionInfo() and a reproducible example, e.g., a subset of 
some/bam/file or use of the files in the experiment data package 
RNAseqData.HNRNPC.bam.chr14

Martin


>
> Stefano
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list