[BioC] possible bug in extendReads() in package chipseq

Nora Rieber n.rieber at dkfz-heidelberg.de
Thu Nov 19 15:32:39 CET 2009


Dear all,

I'm using the ShortRead package to read in my MAQ alignment output files
with readAligned.  Then I extract a subset of reads mapping to one
chromosome:

aln_Pat_chr<-aln_Pat[chromosome(aln_Pat)==chromosomeString]

and extend the reads with extendReads:

aln_Pat_chr_extended<-extendReads(aln_Pat_chr, seqLen=350)

However, extendReads seems to misplace the reads. Here's the output of
aln_Pat_chr:

>aln_Pat_chr
class: AlignedRead
length: 1025425 reads; width: 36 cycles
chromosome: gi|51511732|ref|NC_000016.8|NC_000016
gi|51511732|ref|NC_000016.8|NC_000016 ...
gi|51511732|ref|NC_000016.8|NC_000016 gi|51511732|ref|NC_000016.8|NC_000016
position: 553 553 ... 88698835 88762373
strand: - - ... + -
alignQuality: IntegerQuality
alignData varLabels: nMismatchBestHit mismatchQuality nExactMatch24
nOneMismatch24

and the output of aln_Pat_chr_extended:

> aln_Pat_chr_extended
$`gi|51511732|ref|NC_000016.8|NC_000016`
IRanges of length 1025426
          start end width
[1]         553 902   350
[2]         239 588   350
[3]         239 588   350
[4]         239 588   350
[5]         239 588   350
[6]         239 588   350
[7]         239 588   350
[8]         239 588   350
[9]         239 588   350
...         ... ...   ...
[1025418]   239 588   350
[1025419]   239 588   350
[1025420]   239 588   350
[1025421]   239 588   350
[1025422]   239 588   350
[1025423]   239 588   350
[1025424]   239 588   350
[1025425]   239 588   350
[1025426]   239 588   350

While the highest mapped position is 88762373 before extension, it is
902 after extension. This happens with any dataset I use the function
with. Does anybody observe the same thing?

I tried to recreate this using the maq alignment data coming with the
ShortRead package, but using extendReads on it throws an error:

>testaln<-readAligned("~/R/2.10/ShortRead/extdata/maq","out.aln.2.txt",type="MAQMapview")
>readsextended<-extendReads(testaln,seqLen=40)
Error in s1[[ineg]] : attempt to select less than one element.

Am I doing something wrong?

This is my session info:

sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=de_DE.UTF-8
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] chipseq_0.2.0     ShortRead_1.4.0   lattice_0.17-26   BSgenome_1.14.1
[5] Biostrings_2.14.5 IRanges_1.4.6

loaded via a namespace (and not attached):
[1] Biobase_2.6.0 grid_2.10.0   hwriter_1.1   tools_2.10.0

Best,
Nora



More information about the Bioconductor mailing list