[BioC] IRanges: Need help speeding up sliding window analysis

Michael Dondrup Michael.Dondrup at uni.no
Fri Sep 24 12:11:08 CEST 2010


Hi,

I need some help with speeding up a sliding window analysis on an Rle object of length > 1 million.
I am using functions 'successiveViews' with negative gap width and 'viewApply' vs. viewMeans.
My goal is to apply a discrete differential operator that computes the  difference between the 'left'
half of the window and the 'right', aka. a cheap discrete numeric first order differentiation.

What I found is: viewMeans(x) << viewApply(x, mean) << viewApply(x, diff.op) in terms of time, example below.
Is there a way to pimp this code to make it work on the genome scale? I appreciate your input, I am confident there 
is a better way to do it.

Thank you very much 
Michael

Code example:

diff.op <- function(x, lrprop=1/2) { 
  len = length(x)
  i = ceiling(len*lrprop)  
  (sum(x[i:len]) - sum(x[1:i])) / len
}

sliding.window.apply <-  function(object, width, fun, ...) {
  x <- trim(successiveViews(subject=object, width=rep(width,  ceiling(length(object))  ), gap=-width+1))
  return (Rle(viewApply(x, fun, ...)))
}

sliding.window.mean <-  function(object, width) {
  x <- trim(successiveViews(subject=object, width=rep(width,  ceiling(length(object))  ), gap=-width+1))
  return (viewMeans(x))
}

rle <- Rle(1:10000)

> system.time(sliding.window.mean(rle, 30))
   user  system elapsed 
  0.036   0.004   0.098 
> system.time(sliding.window.apply(rle,  30, mean))
   user  system elapsed 
  4.380   0.065   6.010 
> system.time(sliding.window.apply(rle,  30, diff.op))
   user  system elapsed 
 38.857   0.204  39.127 

> sessionInfo()
R version 2.11.1 (2010-05-31) 
x86_64-apple-darwin9.8.0 

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] IRanges_1.6.6

loaded via a namespace (and not attached):
 [1] annotate_1.26.1      AnnotationDbi_1.10.2 Biobase_2.8.0        DBI_0.2-5            DESeq_1.0.4         
 [6] genefilter_1.30.0    geneplotter_1.26.0   grid_2.11.1          RColorBrewer_1.0-2   RSQLite_0.9-1       
[11] splines_2.11.1       survival_2.35-8      xtable_1.5-6        



More information about the Bioconductor mailing list