[BioC] readGAlignmentPairsFromBam with yieldSize & which
Stefano Calza
stecalza at gmail.com
Wed Jun 18 11:13:00 CEST 2014
Thanks Martin,
I didn't quite understand that sentence: now it clear.
My hope was indeed to be able to iterate by yieldSize within a
prespecified range, without writing a new file to the disk (like
filterBAM does...right?)
Thanks
Stefano
On 06/17/2014 06:31 PM, Martin Morgan wrote:
> On 06/17/2014 05:39 AM, Stefano Calza wrote:
>> Hi!
>>
>> I am experiencing this weird (to me...) behaviour:
>>
>> bfl = BamFile(some/bam/file,yieldSize=100000L)
>>
>> param=ScanBamParam(which=RangesList(chr22=IRanges(1,51304566)))
>>
>> ## 1
>> readGAlignmentsFromBam(bfl)
>>
>> ## 2
>>
>> readGAlignmentsFromBam(bfl,param=param)
>>
>>
>> Now 1) is much faster than 2). Indeed 1) takes ~ 6 secs while 2) at
>> least a
>> couple of minutes till I killed it...
>>
>> Am I doing something wrong or missing something here?
>
> Hi Stefano --
>
> Does the description on ?BamFile help?
>
> yieldSize: Number of records to yield each time the file is read
> from using 'scanBam' or, when 'length(bamWhich()) != 0', a
> threshold which yields records in complete ranges whose sum
> first exceeds 'yieldSize'.
>
> you're getting the complete range chr22=IRanges(1,51304566), because
> that will be the first complete range for which yieldSize is satsified.
>
> The current implementation allows you to iterate through an entire
> file, or iterate through many small ranges, but it does not really
> work well if the goal is to subset the bam file (e.g., by chromsome)
> and iterate through the subset; you would then filterBam and iterate
> through that.
>
> It could also be that there is a bug in the code; it would then help
> to have the output of sessionInfo() and a reproducible example, e.g.,
> a subset of some/bam/file or use of the files in the experiment data
> package RNAseqData.HNRNPC.bam.chr14
>
> Martin
>
>
>>
>> Stefano
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list