[BioC] Error mapping RNA-seq data to genome

Paul Geeleher paulgeeleher at gmail.com
Tue Aug 17 17:09:34 CEST 2010


I'm attempting to read some aligned RNA-seq paired end reads in
Bioconductor. The reads were aligned with maq (.fastq to .bqf to .map)
and converted to SAM then BAM format by samtools as I couldn't get the
paired end reads into biocondcutor in MAQ format.

I'm basically following this tutorial:
http://bioconductor.org/help/course-materials/2010/CSAMA10/Lab-8-RNAseqUseCase.pdf

The following step causes me an error:

> seqlengths(alnRanges) <- seqlengths(hgTxDb)[names(seqlengths(alnRanges))]
Error in validObject(.Object) :
  invalid class "GRanges" object: slot 'ranges' contains values
outside of sequence bounds

I think it means that there is something aligned to outside the range
of my reference genome but I don't understand why this would be as I
used HG18 for everything.

Any suggestions as to how I can possibly figure out what went wrong
here would be greatly appreciated. I can provide more information if
necessary. Summaries of my GenomicRanges and GenomicFeatures objects
are below.


> alnRanges
GRanges with 11634719 ranges and 1 elementMetadata value
           seqnames         ranges strand   | pData.alignData.from...notNA...
              <Rle>      <IRanges>  <Rle>   |                       <integer>
       [1]     chr1   [3345, 3381]      +   |                             163
       [2]     chr1   [3391, 3427]      -   |                             147
       [3]     chr1   [4223, 4259]      +   |                             163
       [4]     chr1   [4247, 4283]      -   |                             147
       [5]     chr1   [4275, 4311]      +   |                             163
       [6]     chr1   [4304, 4340]      +   |                             163
       [7]     chr1   [4320, 4356]      +   |                             163
       [8]     chr1   [4343, 4379]      -   |                             147
       [9]     chr1   [4343, 4379]      -   |                             147
       ...      ...            ...    ... ...                             ...
[11634711]     chrM [16531, 16567]      -   |                             147
[11634712]     chrM [16532, 16568]      +   |                             161
[11634713]     chrM [16532, 16568]      -   |                             147
[11634714]     chrM [16532, 16568]      -   |                             147
[11634715]     chrM [16532, 16568]      -   |                             147
[11634716]     chrM [16534, 16570]      -   |                             147
[11634717]     chrM [16534, 16570]      -   |                             147
[11634718]     chrM [16534, 16570]      -   |                             147
[11634719]     chrM [16541, 16577]      -   |                             147

seqlengths
  chr1  chr2  chr3  chr4  chr5  chr6 ... chr20 chr21 chr22  chrX  chrY  chrM
    NA    NA    NA    NA    NA    NA ...    NA    NA    NA    NA    NA    NA

> hgTxDb <- makeTranscriptDbFromUCSC(genome="hg18", tablename="ensGene")
> hgTxDb
TranscriptDb object:
| Db type: TranscriptDb
| Data source: UCSC
| Genome: hg18
| UCSC Table: ensGene
| Type of Gene ID: Ensembl gene ID
| Full dataset: yes
| transcript_nrow: 63280
| exon_nrow: 276075
| cds_nrow: 225373
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2010-08-17 15:28:19 +0100 (Tue, 17 Aug 2010)
| GenomicFeatures version at creation time: 1.0.6
| RSQLite version at creation time: 0.9-1
>





-- 
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland
--
www.bioinformaticstutorials.com



More information about the Bioconductor mailing list