[BioC] Fetch scaffold sequences from BSgenome package

Hervé Pagès hpages at fhcrc.org
Tue Sep 18 20:49:56 CEST 2012

Hi Julie,

On 09/18/2012 09:05 AM, Zhu, Lihua (Julie) wrote:
> Herve,
> It seems that the getSeq function does not work for scaffold sequence
> such as Zv9_scaffold3564:93,507-93,556 and Zv9_NA384:3,507-3,556 for
> BSgenome.Drerio.UCSC.danRer7_1.3.17. What is the best way to obtain such
> sequences using BSgenome package? Many thanks!


   > library(BSgenome.Drerio.UCSC.danRer7)
   > getSeq(Drerio, "Zv9_scaffold3564")
   Error in .getOneSeqFromBSgenomeMultipleSequences(x, names[i], 
start[i],  :
     sequence Zv9_scaffold3564 found more than once, please use a 
non-ambiguous name

The problem is this:

   > grep("Zv9_scaffold3564", names(Drerio$Zv9_scaffold), value=TRUE)
   [1] "Zv9_scaffold3564"
   > grep("Zv9_scaffold3564", names(Drerio$upstream1000), value=TRUE)
   [1] "ENSDART00000099775_up_1000_Zv9_scaffold3564_137113_r 
   [2] "ENSDART00000062791_up_1000_Zv9_scaffold3564_129501_r 

and also that getSeq() here is looking for a sequence that *contains*
"Zv9_scaffold3564" in its name.

Note that when used with a GRanges object, getSeq() is looking for a
sequence with the exact specified name so that should address the

   > getSeq(Drerio, GRanges("Zv9_scaffold3564", IRanges(3507, 3556)))
     A DNAStringSet instance of length 1
       width seq

Note that our plan for BioC 2.12 is to get rid of the upstream sequences
in the BSgenome packages (they should never have been included here in
the first place, people will be able to use Paul's new getPromoterSeq()
from the GenomicFeatures package instead) so that will entirely solve
the ambiguous name problem.

Let me know if you have any questions about this.


> Best regards,
> Julie

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioconductor mailing list