[BioC] filterBam

rcaloger raffaele.calogero at gmail.com
Tue Jul 16 08:19:27 CEST 2013


Hi,
I am trying to filter a bam file using a set of coordinates.
I sorted the bam file using both sortBam or Picard SortSam tools but I 
always get the same error:

filterBam("accepted_hits.sorted.bam", "accepted_hits.tp_exons", 
param=ScanBamParam(which=exons.gr))

Error in 
FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], 
:
   failed to build index
   file: 
/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons
In addition: Warning messages:
1: In 
FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], 
:
   [bam_index_core] the alignment is not sorted 
(D44TDFP1_1:1:1102:6940:117318): 85079683 > 84952092 in 1-th chr
2: In 
FUN("/data03/calogero/Documents/singapore/transcripts/semi_synthetic_data_set/for_tm_creation/m1/accepted_hits.tp_exons"[[1L]], 
:
   [bam_index_build2] fail to index the BAM file.


Any suggestion how to handle this problem?
cheers
Raf

  exons.gr
GRanges with 3399 ranges and 1 metadata column:
          seqnames                 ranges strand   | transcriptID
             <Rle>              <IRanges>  <Rle> |  <character>
      [1]     chr1   [ 4764598,  4766882]      *   |   uc007aff.2
      [2]     chr1   [ 7110275,  7110696]      *   |   uc007agb.1
      [3]     chr1   [ 9858548,  9858769]      *   |   uc007agw.1
      [4]     chr1   [10029961, 10029968]      *   |   uc007ahk.1
     ...      ...                    ...    ... ...          ...
     [3397]     chrX [147525139, 147525147]      *   |   uc012hqq.1
   [3398]     chrX [148034128, 148034929]      *   |   uc009upe.1
   [3399]     chrX [156357850, 156359017]      *   |   uc009uss.2
   ---
   seqlengths:
     chr1 chr10 chr11 chr12 chr13 chr14 ...  chr5  chr6  chr7  chr8 
chr9  chrX
       NA    NA    NA    NA    NA    NA ...    NA    NA    NA NA    NA    NA

sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

other attached packages:
[1] Rsamtools_1.10.2     Biostrings_2.26.3    GenomicRanges_1.10.7
[4] IRanges_1.16.6       BiocGenerics_0.4.0

loaded via a namespace (and not attached):
[1] bitops_1.0-5    parallel_2.15.3 stats4_2.15.3   zlibbioc_1.4.0

-- ----------------------------------------
Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC Centro 
di Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 
0116706457 Fax ++39 0112366457 Mobile ++39 3333827080 email: 
raffaele.calogero at unito.it raffaele[dot]calogero[at]gmail[dot]com www: 
http://www.bioinformatica.unito.it



More information about the Bioconductor mailing list