[BioC] transcriptsBy via TxDb.Hsapiens.UCSC.hg19.knownGene painfully slow

Martin Morgan mtmorgan at fhcrc.org
Tue Jan 1 23:11:17 CET 2013


On 01/01/2013 02:05 PM, Martin Morgan wrote:
> On 01/01/2013 01:32 PM, Murat Tasan wrote:
>> hi all - does anyone have any performance tips for using
>> transcriptsBy(TXDB, by = "gene") with the UCSC transcript database?
>> in particular, is the SQLite backing database file indexed (along columns
>> holding the internal IDs)?
>> i'd provide some timing results for the command execution, but i ran out of
>> patience after about 10 minutes with no results...
>
> it is 'slow' but only in the couple of seconds definition of slow. Something
> else is going on so a reproducible example, including sessionInfo(), would be
> helfpul.

Just to follow my own advice...


   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
   system.time(res <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene"))
   length(res)
   sessionInfo()

gives me

 >   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
 >   system.time(res <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene"))
    user  system elapsed
   3.020   0.012   3.042
 >   length(res)
[1] 22932
 >   sessionInfo()
R version 2.15.2 Patched (2012-12-23 r61401)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.8.0
[2] GenomicFeatures_1.10.1
[3] AnnotationDbi_1.20.3
[4] Biobase_2.18.0
[5] GenomicRanges_1.10.5
[6] IRanges_1.16.4
[7] BiocGenerics_0.4.0

loaded via a namespace (and not attached):
  [1] biomaRt_2.14.0     Biostrings_2.26.2  bitops_1.0-5       BSgenome_1.26.1
  [5] DBI_0.2-5          parallel_2.15.2    RCurl_1.95-3       Rsamtools_1.10.2
  [9] RSQLite_0.11.2     rtracklayer_1.18.1 stats4_2.15.2      tools_2.15.2
[13] XML_3.95-0.1       zlibbioc_1.4.0

>
>
>>
>> cheers,
>>
>> -m
>>
>>     [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list