[BioC] Rsamtools BAM File Opening Takes Long Time

Paul Leo p.leo at uq.edu.au
Tue Jan 17 03:28:34 CET 2012


It was a while back that I tried this... But I used then
ftpBase <-
"ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/data/"

which was faster(at the time than)

ftpBase <- "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/" 

There are sub-directories in those folders that contain the bam and bai
that you can test on  like 

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/data/NA06984/alignment/

I'm not aware of a 1000genome mirror in OZ ... 

My experience with this was that it took several minutes per region to
get back the data and I had to do a lot of extra error checking cause of
drop outs..

Not aware of an aussie 1000 genome mirror with public access. For larger
dataset sets I just use the VCF files.


Cheers
Paul

Dr Paul Leo
Senior Bioinformatician
UQ Diamantina Institute for Cancer, Immunology and Metabolic Medicine 


-----Original Message-----
From: Dario Strbenac <D.Strbenac at garvan.org.au>
Reply-to: "D.Strbenac at garvan.org.au" <D.Strbenac at garvan.org.au>
To: bioconductor at r-project.org <bioconductor at r-project.org>
Subject: [BioC] Rsamtools BAM File Opening Takes Long Time
Date: Tue, 17 Jan 2012 12:00:10 +1000

Hello,

I'm trying to open a connection to a BAM file and it takes 16 minutes just to open the connection.

Here is a small example :

library(Rsamtools)
fName <- "http://genomesavant.com/savant//data/examples/pulmonary.bam"
> system.time(file <- open(BamFile(fName)))
   user  system elapsed 
   0.09    0.02  989.95

There is a pulmonary.bam.bai file in the same server directory.

Does anyone else have web-accessible BAM files to test this out on ?

> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rsamtools_1.6.3     Biostrings_2.22.0   GenomicRanges_1.6.4 IRanges_1.12.5     
[5] RCurl_1.6-10.1      bitops_1.0-4.1     

loaded via a namespace (and not attached):
[1] BSgenome_1.22.0    rtracklayer_1.14.0 tools_2.14.0       XML_3.4-2.2       
[5] zlibbioc_1.0.0    

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list