[BioC] bug in Biostrings mismatchTable?

Janet Young jayoung at fhcrc.org
Thu Oct 11 02:13:11 CEST 2012


Hi there,

I think I've found a bug in mismatchTable (Biostrings).  It's reporting a mismatch after the end of the reported alignment.  I think the code below shows the problem.

thanks, as usual!

Janet

#####

library(Biostrings)

### couple of seqs, the middle portion aligns, but the last few bases don't. I'm not interested in those last few bases, so I do a local alignment
seq1 <- DNAString("GCTGAAGTAGTTCTCCAGAA")
seq2 <-       DNAString("GTAGTTCTCCAAAGT")
aln1 <- pairwiseAlignment ( seq1, seq2, type="local" )
aln1
# Local PairwiseAlignmentsSingleSubject (1 of 1)
# pattern: [7] GTAGTTCTCCA 
# subject: [1] GTAGTTCTCCA 
# score: 21.79932 

end(pattern(aln1))
# [1] 17

mismatchTable(aln1)
#  PatternId PatternStart PatternEnd PatternSubstring PatternQuality
#1         1           18         18                G              7
#  SubjectStart SubjectEnd SubjectSubstring SubjectQuality
#1           12         12                A              7
#### the one mismatch that's reported is after the end of the alignment as reported above.  There's another mismatch after the end of the alignment that wasn't reported

sessionInfo()

R Under development (unstable) (2012-10-03 r60868)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biostrings_2.27.2  IRanges_1.17.0     BiocGenerics_0.5.0

loaded via a namespace (and not attached):
[1] parallel_2.16.0 stats4_2.16.0  



More information about the Bioconductor mailing list