[BioC] Why is *ply-ing over a GRangesList much slower than *ply-ing over an IRangesList?

Martin Morgan mtmorgan at fhcrc.org
Thu Oct 14 23:55:17 CEST 2010


On 08/24/2010 07:31 PM, Steve Lianoglou wrote:
> Hi,
> 
> Looping using any of the *ply (lapply, sapply, seqapply, etc.) seems
> to be significantly slower when you are iterating over a GRangesList
> vs. an IRangesList:
> 
> R> library(GenomicFeatures)
> R> txdb <- loadFeatures(system.file("extdata", "UCSC_knownGene_sample.sqlite",
>       package="GenomicFeatures"))
> R> xcripts <- transcriptsBy(txdb, 'gene')
> R> system.time(l1 <- sapply(xcripts, length))
>    user  system elapsed
>   2.298   0.003   2.302
> 
> irl <- IRangesList(lapply(xcripts, ranges))
> system.time(l2 <- sapply(irl, length))
>    user  system elapsed
>   0.047   0.001   0.049

As an update, Patrick has improved performance 10x-ish in IRanges
1.7.40, still some more to go...

> replicate(5, system.time(lapply(xcripts, length)))
           [,1]  [,2]  [,3]  [,4]  [,5]
user.self  0.31 0.317 0.318 0.313 0.328
sys.self   0.00 0.002 0.000 0.002 0.000
elapsed    0.31 0.325 0.319 0.317 0.329
user.child 0.00 0.000 0.000 0.000 0.000
sys.child  0.00 0.000 0.000 0.000 0.000

> irl <- IRangesList(lapply(xcripts, ranges))

> replicate(5, system.time(lapply(irl, length)))
            [,1]  [,2]  [,3]  [,4]  [,5]
user.self  0.032 0.031 0.032 0.031 0.030
sys.self   0.000 0.000 0.000 0.001 0.001
elapsed    0.032 0.031 0.032 0.032 0.031
user.child 0.000 0.000 0.000 0.000 0.000
sys.child  0.000 0.000 0.000 0.000 0.000

Martin

> 
> R> identical(l1, l2)
> [1] TRUE
> 
> I was curious if this is known/expected behavior and it's unavoidable, or .. ?
> 
> Thanks,
> -steve
> 
> R> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-08-21 r52791)
> Platform: i386-apple-darwin10.4.0/i386 (32-bit)
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] org.Hs.eg.db_2.4.1     RSQLite_0.9-2          DBI_0.2-5
>   AnnotationDbi_1.11.4
> [5] Biobase_2.9.0          GenomicFeatures_1.1.11 GenomicRanges_1.1.20
>   IRanges_1.7.21
> 
> loaded via a namespace (and not attached):
> [1] BSgenome_1.17.6    Biostrings_2.17.29 RCurl_1.4-3        XML_3.1-1
>          biomaRt_2.5.1
> [6] rtracklayer_1.9.7  tools_2.12.0
> 
> 


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list