[BioC] Fetch scaffold sequences from BSgenome package
Zhu, Lihua (Julie)
Julie.Zhu at umassmed.edu
Tue Sep 18 21:02:01 CEST 2012
Thanks so much for the quick response and the alternative solutions!
On 9/18/12 2:49 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
> Hi Julie,
> On 09/18/2012 09:05 AM, Zhu, Lihua (Julie) wrote:
>> It seems that the getSeq function does not work for scaffold sequence
>> such as Zv9_scaffold3564:93,507-93,556 and Zv9_NA384:3,507-3,556 for
>> BSgenome.Drerio.UCSC.danRer7_1.3.17. What is the best way to obtain such
>> sequences using BSgenome package? Many thanks!
>> getSeq(Drerio, "Zv9_scaffold3564")
> Error in .getOneSeqFromBSgenomeMultipleSequences(x, names[i],
> start[i], :
> sequence Zv9_scaffold3564 found more than once, please use a
> non-ambiguous name
> The problem is this:
>> grep("Zv9_scaffold3564", names(Drerio$Zv9_scaffold), value=TRUE)
>  "Zv9_scaffold3564"
>> grep("Zv9_scaffold3564", names(Drerio$upstream1000), value=TRUE)
>  "ENSDART00000099775_up_1000_Zv9_scaffold3564_137113_r
>  "ENSDART00000062791_up_1000_Zv9_scaffold3564_129501_r
> and also that getSeq() here is looking for a sequence that *contains*
> "Zv9_scaffold3564" in its name.
> Note that when used with a GRanges object, getSeq() is looking for a
> sequence with the exact specified name so that should address the
>> getSeq(Drerio, GRanges("Zv9_scaffold3564", IRanges(3507, 3556)))
> A DNAStringSet instance of length 1
> width seq
>  50 CCTAAGTATCCACTTTAGTATCCATAACACAATAATCAGATGCTATTGTT
> Note that our plan for BioC 2.12 is to get rid of the upstream sequences
> in the BSgenome packages (they should never have been included here in
> the first place, people will be able to use Paul's new getPromoterSeq()
> from the GenomicFeatures package instead) so that will entirely solve
> the ambiguous name problem.
> Let me know if you have any questions about this.
>> Best regards,
More information about the Bioconductor