[BioC] Why is ply-ing over a GRangesList much slower than ply-ing over an IRangesList?

Wed Aug 25 04:31:43 CEST 2010

Hi,

Looping using any of the *ply (lapply, sapply, seqapply, etc.) seems
to be significantly slower when you are iterating over a GRangesList
vs. an IRangesList:

R> library(GenomicFeatures)
R> txdb <- loadFeatures(system.file("extdata", "UCSC_knownGene_sample.sqlite",
      package="GenomicFeatures"))
R> xcripts <- transcriptsBy(txdb, 'gene')
R> system.time(l1 <- sapply(xcripts, length))
   user  system elapsed
  2.298   0.003   2.302

irl <- IRangesList(lapply(xcripts, ranges))
system.time(l2 <- sapply(irl, length))
   user  system elapsed
  0.047   0.001   0.049

R> identical(l1, l2)
[1] TRUE

I was curious if this is known/expected behavior and it's unavoidable, or .. ?

Thanks,
-steve

R> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-08-21 r52791)
Platform: i386-apple-darwin10.4.0/i386 (32-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] org.Hs.eg.db_2.4.1     RSQLite_0.9-2          DBI_0.2-5
  AnnotationDbi_1.11.4
[5] Biobase_2.9.0          GenomicFeatures_1.1.11 GenomicRanges_1.1.20
  IRanges_1.7.21

loaded via a namespace (and not attached):
[1] BSgenome_1.17.6    Biostrings_2.17.29 RCurl_1.4-3        XML_3.1-1
         biomaRt_2.5.1
[6] rtracklayer_1.9.7  tools_2.12.0

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

[BioC] Why is *ply-ing over a GRangesList much slower than *ply-ing over an IRangesList?

[BioC] Why is ply-ing over a GRangesList much slower than ply-ing over an IRangesList?