[BioC] Strecth of N's in between reads

Mon Jan 23 13:58:43 CET 2012

On Mon, Jan 23, 2012 at 1:49 AM, MLSC MANIPAL <mlscmahe at gmail.com> wrote:
> Hello,
>
> I am working on NGS data generated by Illumina Hiseq 2000 and using CASAVA
> to convert .bcl file to fastq. As CASAVA replaces all adaptor sequences to
> N, I have ended up with a doubt. I find a lot of reads with N in between
> the read. I am assuming that may be adaptors and casava has replaced those
> adaptor sequences to N. But I am not sure about it. Hence can somebody give
> me some suggestions on this? This type of masking is also creating problem
> in trimming such N's in the terminals, as I am not sure whether it is
> adaptor or actual sequence. I request your help.
>
> Example is given here,
>
> @HWI-ST846:39:C00DDACXX:8:1101:10834:2386 1:N:0:
> TAATGAGGATCTCATACAGACTCAACAAAAGGCGAAGAAANNNNNNNNNNNNAGGAGGTGTGGGCTGCACCACTGCTTTTCAAAAGGAACTCCATTGTTGA
> +
> @@@DDDD8CDFFFIIE?FH?FGGDAFFIE;@CFED)??FG############--;?CA;6?BBCCB at BBBB
> ?<:(:ABBB:A:@4?<?BBBBBBBEABB?@
>
> Regards,
> mlsc

You should probably contact the sequence provider, as I think this run
might have had some issues.  In any case, you'll need to do some
quality control to figure out what happened.  This is probably not the
best forum to try to go into details.  If you cannot find an answer
with the sequence provider, you may need to have them contact
Illumina.

Good luck.
Sean