[BioC] A code kata: evening-out *Ranges ends.

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Aug 25 03:49:38 CEST 2010


In case anyone else is trying to sharpen their IRanges-fu, here's a
possible solution to my own question -- works much faster.

Instead of:

>  reads <- GRanges(seqnames='chr1', strand='+',
>                ranges=IRanges(start=c(sample(18:22, 10, replace=TRUE),
>                                 sample(100:108, 10, replace=TRUE)),
>                  width=sample(18:20, 20, replace=TRUE)))
>
> You can see there are two "islands" of reads there, at around
> positions 20-40, and 100-120.
> What I want to do is to "fix" the reads in each island to have the
> same start/end position.
>
> Here's code I have to do that (forget any "strand" issues here --
> those are already taken care of):
>
>  fence.posts <- reduce(reads)
>  o <- findOverlaps(fence.posts, reads)
>  subjects <- subjectHits(o)
>  query <- queryHits(o)
>
>  for (idx in unique(query)) {
>    adjust <- subjects[query == idx]
>    start(reads[adjust]) <- start(fence.posts[idx])
>    end(reads[adjust]) <- end(fence.posts[idx])
>  }

I'm doing:

covr <- coverage(reads)[[1]]
fences <- slice(covr, 1)
f.repeat <- viewMaxs(fences)
repacked <- IRanges(start=rep(start(fences), f.repeat),
                    end=rep(end(fences), f.repeat))
if (length(repacked) != length(reads)) {
  stop("Length of repacked reads is not the same as the original")
reads <- GRanges(seqnames=seqnames(reads), ranges=repacked,
                 strand=strand(reads))

I'm assuming that all the reads come from the same chromosome (so I
unlist the result from coverage), and I'm only sending in reads of the
same strand to this part of the function.

Anyway, running this newer version on 1000 tags takes 0.181 seconds,
where the older version took 20 seconds.

Thought some might find it interesting, otherwise sorry to spam.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list