[BioC] mismatch & replacement

Harris A. Jaffee hj at jhu.edu
Fri Nov 5 23:46:03 CET 2010


This example illustrates another approach to the first question.   
You'll need
to post-process using the width of the value if you need to delete or  
select
the barcoded reads.

 > trimLRPatterns(Lpattern="ACGT", subject=c("ACTTAA", "TTTTGG"),  
max.Lmismatch=1)
[1] "AA"     "TTTTGG"

 > trimLRPatterns(Lpattern="ACGT", subject=c("ACTTAA", "TTTTGG"),  
max.Lmismatch=1, ranges=TRUE)
IRanges of length 2
     start end width
[1]     5   6     2
[2]     1   6     6

You can also use agrep with max.distance=1, but you will need to  
narrow to the
barcode region of each read first (you can't employ "^" as a meta- 
character).

-Harris

On Nov 5, 2010, at 4:54 PM, Daniel.Berner at unibas.ch wrote:

> Hi list
> 1. I have a large fastq file containing solexa reads that start  
> with a barcode (identifier to separate individuals). I now want to  
> filter that large data set according to the barcodes using  
> ShortRead. I understand that this is easily done with grep() when  
> one wants a perfect barcode match. However, I want to allow ONE  
> single wrong nucleotide within the barcode, at any position. Is  
> there an efficient way to filter by barcode while allowing a mismatch?
>
> 2. Is there a way to modify nucleotides in ShortRead objects? E.g.,  
> to replace a G by an A at position 3 for ALL sequences in the object?
>
> Thanks!
> Daniel
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list