[BioC] Question about using Biostrings & BSgenome

Joern Toedling toedling at ebi.ac.uk
Wed Sep 17 14:41:53 CEST 2008


Hello,

Biostrings and BSgenome can certainly be used to retrieve genomic
sequences. For instance, here's a very basic function I have used many
times to retrieve the sequence of short genome segments on either strand
of budding yeast.

getYeastSeq <- function(chr, start, end, strand="+"){
  stopifnot(length(chr)==1, length(start)==1, length(end)==1)
  require("BSgenome.Scerevisiae.UCSC.sacCer1")
  strand <- match.arg(strand, c("+","-"))
  thisSeq <- gsub("[[:space:]]","", as.character(getSeq(Scerevisiae,
gsub("17","M",paste("chr",chr,sep="")), start=start, end=end)))
  if (strand=="-")
    thisSeq <- as.character(reverseComplement(DNAString(thisSeq)))
  return(thisSeq)
}#getYeastSeq

getYeastSeq(chr=2, start=200000, end=200020) ## test

Biostrings offers many utility functions to work with DNA sequences. And
you can always convert the sequences into character vectors and use
basic R operations on those. Not sure what other games you have in mind
when you say "play", but I guess a more precise question whether you can
do XYZ with Biostrings or any other Bioconductor package will result in
a more informative answer.

Regards,
Joern


J.delasHeras at ed.ac.uk wrote:
>
> I haven't yet used either of these packages, but it looks like
> something I may want to look at.
>
> I was wondering if I can use these packages together with something
> like 'BSgenome.Hsapiens.UCSC.hg18' to extract sequences around every
> TSS, for instance.
> I have a couple of different oligo array designs, both in human and
> mouse, and I would like to subset probes according to a number of
> criteria, such as "promoter", "intergenic", etc...
> I'm not yet familiar with these packages but I suspect they will
> provide all teh tools I need to extract and "play" with genomic
> sequences.
>
> Am I right?
>
> Anybody has some examples to help me get a better overview, beyond
> those in the vignettes?
>
> Thanks.
>
> Jose
>

-- 
Joern Toedling
EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom
Phone  +44(0)1223 492566
Email  toedling at ebi.ac.uk



More information about the Bioconductor mailing list