[BioC] extracting sequence from a genome

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Mar 15 19:40:27 CET 2012


Hi Iain,

On Thu, Mar 15, 2012 at 2:16 PM, Iain Gallagher
<iaingallagher at btopenworld.com> wrote:
>
>
> Hi Gentlemen
>
> Thanks for taking the time. Here's where I am just now:
>
> library(BSgenome.Rnorvegicus.UCSC.rn4)# get genome
> library(GenomicRanges)
>
> fl <- "ftp://mirbase.org/pub/mirbase/CURRENT/genomes/rno.gff" # get miR coords
> gff <- read.table(fl) # as dataframe
>
> names <- gff[,10]
> nms <- gsub(";", "", gsub("\"", "", gsub("ID=\"", "", names))) # a set of nested gsub with regex to leave only the text in the double quotes
>
> gr <- GRanges(seqnames = Rle(paste('chr', gff[,1], sep='')), ranges = IRanges(gff[,4], end = gff[,5], names = nms), strand = Rle(gff[,7]))
>
> seqs <- getSeq(Rnorvegicus, flank(gr, 200))
> names(seqs) <- nms
>
> It's much lighter on it's feet than a loop and a nice intro to the GenomicRanges package for me.
>
> As a follow-up question I'm going to write out the seqs object as fasta and use it in clover for TFBS analysis.
>
>
> I was wondering whether the strand is taken into account when I get the flanking sequence i.e. is the sequence returned from the + or - strand as defined in the GRanges object?

A call to `flank` on a GRanges object is "strand aware" -- is that
what you're asking?

Consider:

R> gr <- GRanges('chr1', IRanges(c(20, 20), width=5), c('+', '-'))
R> flank(gr, 10)
GRanges with 2 ranges and 0 elementMetadata values:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [10, 19]      +
  [2]     chr1  [25, 34]      -

R> flank(gr, 10, start=FALSE)
GRanges with 2 ranges and 0 elementMetadata values:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1  [25, 34]      +
  [2]     chr1  [10, 19]      -

getSeq is also strand aware -- it will return the reverse complement
of the sequence in your bounds if its on the `-` strand.

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list