[BioC] MEDIPS, GRanges, IRanges error processing BAM file

Jonathan Ellis Jonathan.Ellis at qimr.edu.au
Tue Aug 20 08:50:53 CEST 2013


Hi Lukas,

Thanks for your reply.

> Do they contain non-mapped reads?

I hadn't realised, but yes they do.  Each of the problematic BAMs
contains a small number of unmapped reads (between 1 and 3).

> MEDIPS 1.10.0 reports an error, if there are none mapped reads in the
> bam files,

Running MEDIPS 1.10.0, I do not see any errors or warnings about
unmapped reads only the error I previously posted.

I tried the code you suggested and ended up with a data frame that did
not contain any NA values; however, if I try the code without the
scanFlag parameter I get a data frame that does contain NA values.  I've
created a set of BAMs with unmapped reads filtered out, and I've run
MEDIPS again.  This time MEDIPS successfully runs on the data set.

Thanks again, your reply has saved me a great deal of time,

Jonathan


On Fri, Aug 16, 2013 at 12:26:06PM -0700, Lukas Chavez wrote:
> Hi Jonathan,
>
> I have not encountered this problem previously and do not have an
> immediate solution.
>
> You state that 16 out of 20 bam files can be processed without a
> problem by applying
> mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam", BSgenome =
> "Hsapiens", sample_name = "0139202").
> This surprises me a little bit, because you have to state the whole
> BSgenome name, for example BSgenome ="BSgenome.Hsapiens.UCSC.hg19".
> [It was pointed out few days ago that I can make the package much more
> flexible by allowing other data types- Hervé, thank you for the hint,
> we will definitely consider this]. However, this does not seem to be
> your problem as you encounter problems for only 4 out of 20 bam files.
>
> Therefore, I assume that there is something strange with these
> specific bam files. Do they contain non-mapped reads? MEDIPS 1.10.0
> reports an error, if there are none mapped reads in the bam files, but
> the latest version of MEDIPS available in the development branch can
> deal with this. However, based on the example read you've sent this
> might not cause the error. The last regular status report you get is
> 'Creating GRange Object...'. This happens in the getGRange() function
> called by MEDIPS.createSet() after the bam files has been imported.
> The command that probably causes the error is
>
> regions_GRange = GRanges(seqnames=regions$chr,
> ranges=IRanges(start=regions$start, end=regions$stop),
> strand=regions$strand)
>
> but I do not understand why you have missing values. In order to
> understand this, you might want to try to import your bam file by
> yourself using
>
> library(Rsamtools)
> scanFlag = scanBamFlag(isUnmappedQuery = F) #this will ignore unmapped
> reads in the bam file and is implemented in the latest dev version
> (1.11.16).
> scanParam=ScanBamParam(flag=scanFlag, what = c("rname", "pos", "strand",
> "qwidth"))
> regions = scanBam(file="0139202.fq.sam.noDUP.bam.qf.bam", param=scanParam)
> regions = do.call(rbind,lapply(regions, as.data.frame, stringsAsFactors=F))
> regions = data.frame(chr=as.character(as.vector(regions$rname)),
> start=as.numeric(as.vector(regions$pos)),
> stop=as.numeric(as.vector(regions$pos)+as.vector(regions$qwidth)-1),
> strand=as.character(as.vector(regions$strand)), stringsAsFactors=F)
>
> Afterwards, it will be necessary to investigate where your regions
> object has NA values.
>
> Another thing: do you have bam index files in the same directory where
> you store the bam file? MEDIPS will make use of these, if they are
> available (using slightly different code as stated above). In case
> there is a discrepancy between the index and bam files (having the
> same prefix file name), this might cause a problem as well.
>
> Finally, you can convert your bam files into bed files (by yourself)
> and try uploading simple txt based bed files into MEDIPS.
>
> I hope we will identify the root of your error.
> Lukas
>
> On Thu, Aug 15, 2013 at 9:28 PM, Jonathan Ellis
> <Jonathan.Ellis at qimr.edu.au>wrote:
>
> > Dear Lukas and list,
> >
> > I'm trying to process a set of BAM files using the latest version of
> > MEDIPS (1.10.0), but have run into problems creating MEDIPSset
> > objects for some BAM file.  The following is an example, but I'm
> > getting the same error for 4 out of 20 BAM files.
> >
> > > mset <- MEDIPS.createSet("0139202.fq.sam.noDUP.bam.qf.bam",
> > +     BSgenome = "Hsapiens", sample_name = "0139202")
> > Reading bam alignment 0139202.fq.sam.noDUP.bam.qf.bam
> > Total number of imported short reads: 13813686
> > Creating GRange Object...
> > Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges")
> > :
> >   solving row 11345799: range cannot be determined from the supplied
> > arguments
> >   (too many NAs)
> >
> > The data are single-end reads that were aligned with bwa version
> > 0.5.9-r16 (the alignments were done some time ago hence the older
> > version of bwa), and the corresponding line from the BAM (11,345,799)
> > is:
> >
> > 19441177        16      chr17   38161416        18      49M     *       0
> >       0       TAGAGTCCGGCGTTCAGGGGCAGGAAGCATCCAGCACGGGAGAAAGATG
> > BBBBBBBBBBBBBBBBBd_IIX[ffgb[[YJb^d^[[ggee^^cc^^^_      X0:i:1  X1:i:3
> >  XA:Z:chr1,+56337006,49M,3;chr12,-55896993,49M,3;chr4,-22778322,49M,3;
> > MD:Z:8A7A32     XG:i:0  NM:i:2  XM:i:2  XO:i:0XT:A:U
> >
> > I'm unsure whether this is a problem with the MEDIPS package or
> > something from GRanges/IRanges.  As far as I understand the .Call2
> > function is part of IRanges, but I assume it's failing due to
> > something passed by MEDIPS.  Any advice or pointers would be greatly
> > appreciated.
> >
> > > sessionInfo()
> > R version 3.0.1 (2013-05-16)
> > Platform: x86_64-unknown-linux-gnu (64-bit)
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
> >  [7] LC_PAPER=C                 LC_NAME=C
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] parallel  stats     graphics  grDevices utils     datasets  methods
> > [8] base
> >
> > other attached packages:
> >  [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 MEDIPS_1.10.0
> >  [3] DNAcopy_1.34.0                     BSgenome_1.28.0
> >  [5] FDb.InfiniumMethylation.hg19_1.0.1 rtracklayer_1.20.4
> >  [7] Biostrings_2.28.0                  GenomicFeatures_1.12.3
> >  [9] AnnotationDbi_1.22.6               Biobase_2.20.1
> > [11] GenomicRanges_1.12.4               IRanges_1.18.2
> > [13] BiocGenerics_0.6.0                 BiocInstaller_1.10.3
> >
> > loaded via a namespace (and not attached):
> >  [1] biomaRt_2.16.0   bitops_1.0-5     DBI_0.2-7        gtools_3.0.0
> >  [5] RCurl_1.95-4.1   Rsamtools_1.12.3 RSQLite_0.11.4   stats4_3.0.1
> >  [9] tools_3.0.1      XML_3.98-1.1     zlibbioc_1.6.0
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >



More information about the Bioconductor mailing list