[BioC] Accessing next gen sequence data remotely via biocondcutor

Martin Morgan mtmorgan at fhcrc.org
Fri Dec 17 16:40:00 CET 2010


On 12/17/2010 07:23 AM, Ruppert Valentino wrote:
> 
> Hello,
> 
> I am trying to access next gen sequencing data remotely via
> R/bioconductor but I can't seem to send queries to it like using
> biomaRt. I tried Rsamtools but even with that there is no way to
> query the sequence file directly.
> 
> What I am trying to do is to get sequence data for specific regions
> e.g. chrom5 150100000 to 150101000 from http://www.1000genomes.org/
> cases e.g. NA19240, however there doesn't seem to be any tool to this
> easily.
> 
> In the Rsamtools they mention that initially they downloaded this
> using samtools view bamfile
> 
> Does anyone know of a way to access next gen sequence data remotely
> without having to download them locally, if so I would appreciate it
> if they email me the R script for that.

Pointing to the bam url as the 'file' argument to scanBam will first
download the index and then perform the query. Better to download the
index ('.bai') file then scanBam(remoteUrl, localIndex). It also makes
sense to do the arithmetic about volume of data to be downloaded -- if
you're going to download most of the data anyway, then far better to use
the 'aspera' plugin provided by 1000genomes to pull the bam files,
quickly, down, and do local access. The basic work flow is sketched in
the Rsamtools vignette; look for na19240url.

Martin

> 
> Thanks [[alternative HTML version deleted]]
> 
> _______________________________________________ Bioconductor mailing
> list Bioconductor at r-project.org 
> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
> archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list