[BioC] Trimming of partial adaptor sequences

Devon Ryan dpryan at dpryan.com
Mon Jul 22 22:16:27 CEST 2013


Hi Sean,

Have you tried just using a read trimmer, such as trim_galore or trimmomatic? That would seem much easier than rolling your own solution in R.

Cheers,
Devon

____________________________________________
Devon Ryan, Ph.D.
Email: dpryan at dpryan.com
Molecular and Cellular Cognition Lab
German Centre for Neurodegenerative Diseases (DZNE)
Ludwig-Erhard-Allee 2
53175 Bonn, Germany

On Jul 22, 2013, at 10:02 PM, Taylor, Sean D wrote:

> We have been experimenting with a NGS protocol in which we insert sheared genomic fragments into a custom plasmid for sequencing on an Illumina MiSeq instrument. The insertion site of this plasmid is flanked by our own custom barcodes (N7) and ~80 nt Illumina-based adaptor sequence. We then PCR out the insert with barcodes and adaptors for sequencing. Our adaptor sequence is similar to the Illumina adaptor, but we use custom primer binding sites. We are not sure if the Illumina software will be able to recognize and trim our custom adaptors. We are trying to figure out the best way to trim read through into the 3' adaptor ourselves.  We have roughly three scenarios:
> 
> (1) The insert is long enough that we have no read through
> (2) The vector is empty, in which case the entire adaptor sequence is present
> (3) The insert is long enough to have useful data, but we get read-through into the 3' adaptor sequence that must be trimmed.
> 
> The solution we are currently working on is to identify the minimal sequence that is recognizable as the adaptor sequence and trim that using trimLRPatterns() in the Biostrings package.  Ideally we would like it if we could give trimLRPatterns() the entire adaptor sequence and have it recognize it on our reads even if it is only partially present. However, in my experimenting it did not seem to be able to this. I thought I would ask the Bioconductor community if there are any better solutions to recognizing and trimming partial adaptor sequences.
> 
> Thanks in advance for any input.
> 
> Sean Taylor
> Post-doctoral Fellow
> Fred Hutchinson Cancer Research Center
> 206-667-5544
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list