[BioC] How to get sequences corresponding to a GRanges?

Martin Morgan mtmorgan at fhcrc.org
Mon Oct 29 00:49:00 CET 2012


On 10/28/2012 11:28 PM, Cei Abreu-Goodger wrote:
> Hello all,
>
> I was wondering if there was a simple way to get the sequences corresponding to
> the ranges stored in a GRanges object. If you have the original sequences in a
> BSgenome object, you can use 'getSeq'. But what if you just have the fasta file,
> imported as a DNAStringSet object?
>
> I want to avoid having to forge a new BSgenome object each time, since I'm
> dealing with unfinished assemblies, with thousands of sequences that I don't
> want to split into individual fasta files, etc.

Rsamtools has FaFile and FaFileList to represent (indexed, via indexFa) fasta 
files, and a getSeq method that takes an FaFile and a GRanges (or similar) 
object. This is built on top of scanFa. See

   library(Rsamtools)
   method?"getSeq,FaFile"

Martin

>
> Many thanks,
>
> Cei
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list