[BioC] Trimming of partial adaptor sequences

James W. MacDonald jmacdon at uw.edu
Mon Jul 22 22:19:24 CEST 2013


Hi Sean,

On 7/22/2013 4:02 PM, Taylor, Sean D wrote:
> We have been experimenting with a NGS protocol in which we insert sheared genomic fragments into a custom plasmid for sequencing on an Illumina MiSeq instrument. The insertion site of this plasmid is flanked by our own custom barcodes (N7) and ~80 nt Illumina-based adaptor sequence. We then PCR out the insert with barcodes and adaptors for sequencing. Our adaptor sequence is similar to the Illumina adaptor, but we use custom primer binding sites. We are not sure if the Illumina software will be able to recognize and trim our custom adaptors. We are trying to figure out the best way to trim read through into the 3' adaptor ourselves.  We have roughly three scenarios:
>
> (1) The insert is long enough that we have no read through
> (2) The vector is empty, in which case the entire adaptor sequence is present
> (3) The insert is long enough to have useful data, but we get read-through into the 3' adaptor sequence that must be trimmed.
>
> The solution we are currently working on is to identify the minimal sequence that is recognizable as the adaptor sequence and trim that using trimLRPatterns() in the Biostrings package.  Ideally we would like it if we could give trimLRPatterns() the entire adaptor sequence and have it recognize it on our reads even if it is only partially present. However, in my experimenting it did not seem to be able to this. I thought I would ask the Bioconductor community if there are any better solutions to recognizing and trimming partial adaptor sequences.

I will leave it to other more experienced users to comment on 
Bioconductor tools that will do this. However, if there isn't an easy 
solution, I have used Trimmomatic for an experiment where there was a 
similar issue of read-through into the 3' adaptor (and vice versa - it 
was a miRNA experiment where they did 100 bp reads, paired end - and 
given that miRNAs are 21-23 bp long, they almost always read through 
into the far adaptor).

The upside is that Trimmomatic appears to be able to handle partial 
adaptor sequences, and you can pass in multiple adaptors in a FASTA 
file. The downside is that you have to manually reverse complement the 
adaptors if you are doing paired-end. And it is java, so fairly slow.

http://www.usadellab.org/cms/?page=trimmomatic

Best,

Jim


>
> Thanks in advance for any input.
>
> Sean Taylor
> Post-doctoral Fellow
> Fred Hutchinson Cancer Research Center
> 206-667-5544
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list