[BioC] Rsamtools BAM File Opening Takes Long Time

Martin Morgan mtmorgan at fhcrc.org
Tue Jan 17 03:48:13 CET 2012


On 01/16/2012 06:00 PM, Dario Strbenac wrote:
> Hello,
>
> I'm trying to open a connection to a BAM file and it takes 16 minutes just to open the connection.
>
> Here is a small example :
>
> library(Rsamtools)
> fName<- "http://genomesavant.com/savant//data/examples/pulmonary.bam"
>> system.time(file<- open(BamFile(fName)))
>     user  system elapsed
>     0.09    0.02  989.95
>
> There is a pulmonary.bam.bai file in the same server directory.
>
> Does anyone else have web-accessible BAM files to test this out on ?

The opposite of what you asked for, but maybe a useful data point anyway

 > system.time(file <- open(BamFile(fName)))
    user  system elapsed
   0.024   0.016   0.294
Warning message:
In open.BamFile(BamFile(fName)) :
   [knet_seek] SEEK_END is not supported for HTTP. Offset is unchanged.

and

 > system.time(countBam(file, param=ScanBamParam(which=GRanges("chr18", 
IRanges(1, 1000000)))))
    user  system elapsed
   0.040   0.008   0.682

As Paul alludes to, using the remote BAM might be a false economy, if 
over the course of your analysis you download a substantial amount of 
the file anyway.

Martin

>
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
> [5] LC_TIME=English_Australia.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Rsamtools_1.6.3     Biostrings_2.22.0   GenomicRanges_1.6.4 IRanges_1.12.5
> [5] RCurl_1.6-10.1      bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.22.0    rtracklayer_1.14.0 tools_2.14.0       XML_3.4-2.2
> [5] zlibbioc_1.0.0
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list