[BioC] GenomicAlignments and QNAME collision

Stefano Calza stecalza at gmail.com
Thu May 8 18:03:29 CEST 2014


Thanks Valerie

I have got this BAM files from different sources but they cannot be 
distributed.

Up to now I experienced twp different 'patterns' in QNAME. One is the 
trailing value as we said (/1, /2). Another one is a leading string. Eg. 
(made up QNAME)

SRR1122.12345HTR
SRR1123.12345HTR

So there must be removed SRR1122 and SRR1123)

My little program actually uses a regex substitution, so the user can 
decide what pattern to edit. This second one though it seems quit unusual.

Those with  trailing values were downloaded by TCGA (if I recall 
correctly the use a pipeline called MapSplice)


Regards

Stefano

On 05/08/2014 05:54 PM, Valerie Obenchain wrote:
> Hi Stefano,
>
> No, the current mate-pairing doesn't handle the trailing values. I 
> will implement this and post back when it's done.
>
> For reference, where did you download your bam files or what 
> application outputs QNAMEs in this format? I'd like to have some for 
> test data.
>
>
> Thanks.
> Valerie
>
>
> On 05/08/14 08:14, Stefano Calza wrote:
>> Hi everybody
>>
>>
>> I am using GenomicAlignments package to read RNAseq pair-end data. The
>> problem is that readGAlignmentPairsFromBam, after setting asMates=TRUE
>> in BamFile, returns 0 mates.
>>
>> The reason is that mates have different QNAMEs. Eg:
>>
>> UNC15-SN850:240:D148CACXX:3:1308:19719:99367/1
>> UNC15-SN850:240:D148CACXX:3:1308:19719:99367/2
>>
>> that is the two mates have /1 or /2 at the end.
>>
>> I wrote a Python (and a cpp) program to fix it...but this takes still
>> quite a substantial amount of time on big files.
>>
>> Does the mating algorithm allow for this? If so how?
>>
>> Regards
>>
>> Stefano
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list