[BioC] Fetch scaffold sequences from BSgenome package

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Tue Sep 18 21:02:01 CEST 2012


Herve,

Thanks so much for the quick response and the alternative solutions!

Best regards,

Julie


On 9/18/12 2:49 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:

> Hi Julie,
> 
> On 09/18/2012 09:05 AM, Zhu, Lihua (Julie) wrote:
>> Herve,
>> 
>> It seems that the getSeq function does not work for scaffold sequence
>> such as Zv9_scaffold3564:93,507-93,556 and Zv9_NA384:3,507-3,556 for
>> BSgenome.Drerio.UCSC.danRer7_1.3.17. What is the best way to obtain such
>> sequences using BSgenome package? Many thanks!
> 
> Indeed:
> 
>> library(BSgenome.Drerio.UCSC.danRer7)
>> getSeq(Drerio, "Zv9_scaffold3564")
>    Error in .getOneSeqFromBSgenomeMultipleSequences(x, names[i],
> start[i],  :
>      sequence Zv9_scaffold3564 found more than once, please use a
> non-ambiguous name
> 
> The problem is this:
> 
>> grep("Zv9_scaffold3564", names(Drerio$Zv9_scaffold), value=TRUE)
>    [1] "Zv9_scaffold3564"
>> grep("Zv9_scaffold3564", names(Drerio$upstream1000), value=TRUE)
>    [1] "ENSDART00000099775_up_1000_Zv9_scaffold3564_137113_r
> Zv9_scaffold3564:137113-138112"
>    [2] "ENSDART00000062791_up_1000_Zv9_scaffold3564_129501_r
> Zv9_scaffold3564:129501-130500"
> 
> and also that getSeq() here is looking for a sequence that *contains*
> "Zv9_scaffold3564" in its name.
> 
> Note that when used with a GRanges object, getSeq() is looking for a
> sequence with the exact specified name so that should address the
> problem:
> 
>> getSeq(Drerio, GRanges("Zv9_scaffold3564", IRanges(3507, 3556)))
>      A DNAStringSet instance of length 1
>        width seq
>    [1]    50 CCTAAGTATCCACTTTAGTATCCATAACACAATAATCAGATGCTATTGTT
> 
> Note that our plan for BioC 2.12 is to get rid of the upstream sequences
> in the BSgenome packages (they should never have been included here in
> the first place, people will be able to use Paul's new getPromoterSeq()
> from the GenomicFeatures package instead) so that will entirely solve
> the ambiguous name problem.
> 
> Let me know if you have any questions about this.
> 
> Thanks,
> H.
> 
>> 
>> Best regards,
>> 
>> Julie



More information about the Bioconductor mailing list