[BioC] Getting the length of every element from a large CompressedIRangesList is slow

Nicolas Delhomme delhomme at embl.de
Mon Jul 2 19:18:56 CEST 2012


Hi,

Just to extend on my previous message:

Doing this instead is fast:

> system.time(sizes <- sapply(width(aln.ranges),length))

  user  system elapsed 
  1.109   0.144   1.254

Cheers,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------





On Jul 2, 2012, at 7:02 PM, Nicolas Delhomme wrote:

> Hej!
> 
> I've a rather large CompressedIRangesList
> 
>> print(object.size(aln.ranges),unit="Mb")
> 390.4 Mb
> 
> that has 2518 elements, some of which having up to 6M ranges for a total of 51M, but the vast majority are small, the median is 2 while the mean is ~ 20,000 (the 3rd quartile has a value of 47).
> 
> Retrieving the element length is slow:
> 
>> system.time(sizes <- sapply(aln.ranges,length))
> 
> user  system elapsed 
> 265.777 169.222 443.498
> 
> by comparison to the performances of the IRanges package in general, which I was surprised of. Are there faster way to get this information than the sapply I'm using? Note that the machine I'm using is not a limiting factor in terms of CPU/RAM/load.
> 
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
> 
> locale:
> [1] C/UTF-8/C/C/C/C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] IRanges_1.15.15    BiocGenerics_0.3.0
> 
> loaded via a namespace (and not attached):
> [1] stats4_2.15.1
> 
> Nico
> 
> P.S. If you need, I can send my aln.ranges object off-list.
> 
> ---------------------------------------------------------------
> Nicolas Delhomme
> 
> Genome Biology Computational Support
> 
> European Molecular Biology Laboratory
> 
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list