[BioC] Error mapping RNA-seq data to genome

Paul Geeleher paulgeeleher at gmail.com
Mon Aug 23 15:22:28 CEST 2010


Ah of course, that never occured to me, thanks!

On Mon, Aug 23, 2010 at 2:03 PM, Joern Toedling <Joern.Toedling at curie.fr> wrote:
> Hello,
>
> as the mtDNA is circular, such an overhanging alignment covering end and start
> of the mtDNA is not worrying at all.
>
> As such an alignment can also happen with bacterial genomes, one could start
> thinking whether such alignments should be treated in a special way. However,
> I am not sure whether this should be addressed by BioC or rather upstream in
> the alignment program output formats such as SAM/BAM.
>
> Regards,
> Joern
>
> On Mon, 23 Aug 2010 12:51:03 +0100, Paul Geeleher wrote
>> Hi Martin,
>>
>> Your advice seems to have been pretty successful, there was a read
>> aligned beyond the end of the chrM. No idea why this might have
>> happened though, which is a bit worrying.
>>
>> For the record removing the offending read worked ok. I used the
>> following code:
>>
>> # find offending read:
>> > refLen <- seqlengths(hgTxDb)[as.character(seqnames(alnRanges))]
>> > print(alnRanges[refLen < end(alnRanges)])
>>
>> GRanges with 1 range and 1 elementMetadata value
>>     seqnames         ranges strand | pData.alignData.from...notNA...
>>        <Rle>      <IRanges>  <Rle> |                       <integer>
>> [1]     chrM [16541, 16577]      - |                             147
>>
>> # remove offending read:
>> alnRanges <- alnRanges[-which(refLen < end(alnRanges))]
>>
>> Cheers,
>>
>> Paul
>>
>> On Wed, Aug 18, 2010 at 2:30 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> > Hi Paul --
>> >
>> > On 08/17/2010 08:09 AM, Paul Geeleher wrote:
>> >> I'm attempting to read some aligned RNA-seq paired end reads in
>> >> Bioconductor. The reads were aligned with maq (.fastq to .bqf to .map)
>> >> and converted to SAM then BAM format by samtools as I couldn't get the
>> >> paired end reads into biocondcutor in MAQ format.
>> >>
>> >> I'm basically following this tutorial:
>> >>
> http://bioconductor.org/help/course-materials/2010/CSAMA10/Lab-8-RNAseqUseCase.pdf
>> >>
>> >> The following step causes me an error:
>> >>
>> >>> seqlengths(alnRanges) <- seqlengths(hgTxDb)[names(seqlengths(alnRanges))]
>> >> Error in validObject(.Object) :
>> >>   invalid class "GRanges" object: slot 'ranges' contains values
>> >> outside of sequence bounds
>> >>
>> >> I think it means that there is something aligned to outside the range
>> >> of my reference genome but I don't understand why this would be as I
>> >> used HG18 for everything.
>> >
>> > probably one or more of your reads are 'hanging over' one of the
>> > chromosome bounds. You might
>> >
>> >  tapply(alnRanges, seqnames(alnRanges), f
>> >         unction(x) c(min(start(x)), max(end(x))))
>> >
>> > to get a sense of whether you're on the right track, and maybe something
>> > like
>> >
>> >  refLen <- seqlengths(hgTxDb)[as.character(seqnames(alnRanges)]
>> >  alnRanges[refLen < end(alnRanges)
>> >
>> > to get the actual ranges? You could trim or discard these
>> >
>> > Martin
>> >
>
>



-- 
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com



More information about the Bioconductor mailing list