[BioC] speed of runmed on SimpleRleList

Janet Young jayoung at fhcrc.org
Fri Feb 15 21:27:05 CET 2013


Hi there,

I've been using runmean on some coverage objects - nice.   Runmed also seems useful, but also seems very slow (oddly so) - I'm wondering whether there's some easy improvement could be made there?  Same issue with devel version and an older version.  All should be explained in the code below (I hope). 

thanks very much,

Janet

------------------------------------------------------------------- 

Dr. Janet Young 

Malik lab

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168, 
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung  ...at...  fhcrc.org

------------------------------------------------------------------- 



library(GenomicRanges)

### a small GRanges object (example from ?GRanges)
seqinfo <- Seqinfo(paste0("chr", 1:3), c(1000, 2000, 1500), NA, "mock1")
gr2 <-GRanges(seqnames =
          Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
          ranges = IRanges(1:10, width = 10:1, names = head(letters,10)),
          strand = Rle(strand(c("-", "+", "*", "+", "-")),c(1, 2, 2, 3, 2)),
          score = 1:10,
          GC = seq(1, 0, length=10),
          seqinfo=seqinfo)
gr2

cov <- coverage(gr2)

### runmed is slow! (for you Hutchies: this is on a rhino machine)
### I'm really trying to run this on some much bigger objects (whole genome coverage), where the slowness is more of an issue.
system.time(runmed(cov, 51))
#   user  system elapsed 
#  1.120   0.016   1.518 

### runmean is fast
system.time(runmean(cov, 51))
#    user  system elapsed 
#   0.008   0.000   0.005 

sessionInfo()


R Under development (unstable) (2012-10-03 r60868)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] GenomicRanges_1.11.29 IRanges_1.17.32       BiocGenerics_0.5.6   

loaded via a namespace (and not attached):
[1] stats4_2.16.0


## see a similar issue on my Mac, using older R

R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GenomicRanges_1.10.1 IRanges_1.16.2       BiocGenerics_0.4.0  

loaded via a namespace (and not attached):
[1] parallel_2.15.1 stats4_2.15.1  



More information about the Bioconductor mailing list