[BioC] readGAlignmentPairsFromBam with yieldSize & which

Stefano Calza stecalza at gmail.com
Wed Jun 18 11:13:00 CEST 2014


Thanks Martin,

I didn't quite understand that sentence: now it clear.

My hope was indeed to be able to iterate by yieldSize within a 
prespecified range, without  writing a new file to the disk (like 
filterBAM does...right?)

Thanks

Stefano


On 06/17/2014 06:31 PM, Martin Morgan wrote:
> On 06/17/2014 05:39 AM, Stefano Calza wrote:
>> Hi!
>>
>> I am experiencing this weird (to me...) behaviour:
>>
>> bfl = BamFile(some/bam/file,yieldSize=100000L)
>>
>> param=ScanBamParam(which=RangesList(chr22=IRanges(1,51304566)))
>>
>> ## 1
>> readGAlignmentsFromBam(bfl)
>>
>> ## 2
>>
>> readGAlignmentsFromBam(bfl,param=param)
>>
>>
>> Now 1) is much faster than 2). Indeed 1) takes ~ 6 secs while 2) at 
>> least a
>> couple of minutes till I killed it...
>>
>> Am I doing something wrong or missing something here?
>
> Hi Stefano --
>
> Does the description on ?BamFile help?
>
>    yieldSize: Number of records to yield each time the file is read
>           from using 'scanBam' or, when 'length(bamWhich()) != 0', a
>           threshold which yields records in complete ranges whose sum
>           first exceeds 'yieldSize'.
>
> you're getting the complete range chr22=IRanges(1,51304566), because 
> that will be the first complete range for which yieldSize is satsified.
>
> The current implementation allows you to iterate through an entire 
> file, or iterate through many small ranges, but it does not really 
> work well if the goal is to subset the bam file (e.g., by chromsome) 
> and iterate through the subset; you would then filterBam and iterate 
> through that.
>
> It could also be that there is a bug in the code; it would then help 
> to have the output of sessionInfo() and a reproducible example, e.g., 
> a subset of some/bam/file or use of the files in the experiment data 
> package RNAseqData.HNRNPC.bam.chr14
>
> Martin
>
>
>>
>> Stefano
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list