[R] RSQLite slowness

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Wed Oct 6 19:22:25 CEST 2021


Generally, such gene-related questions are better asked on Bioconductor
than here. They also might know of more efficient, purpose built tools for
your efforts there. No guarantees, of course, and you might get a helpful
response here. But if not ...

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Oct 6, 2021 at 9:23 AM Rasmus Liland <jral using posteo.no> wrote:

> Dear r-help readers,
>
> why is it so much slower to query an
> sqlite database using RSQlite «from the
> outside» using param like
>
>         statement <-
>          "SELECT * FROM gene2refseq
>           LEFT JOIN gene_info ON
>             gene_info.GeneID = gene2refseq.GeneID
>           WHERE gene2refseq.`RNA_nucleotide_accession.version`
>             LIKE ?"
>         db <- "gene_info.sqlite"
>         conn <- DBI::dbConnect(RSQLite::SQLite(), db)
>         x1 <- DBI::dbGetQuery(conn=conn,
>           statement=statement,
>           param=list(Håkan20210914$RNANucleotideAccession))
>
> compared to querying «from the inside»
> of sqlite, by writing your search terms
> as a table first, and then calling it
>
>         statement <-
>          "SELECT * FROM H
>           LEFT JOIN gene2refseq R ON
>             R.`RNA_nucleotide_accession.version`
>             LIKE '%' || H.RNANucleotideAccession || '%'
>           LEFT JOIN gene_info I ON I.GeneID = R.GeneID"
>         DBI::dbWriteTable(conn, "H", Håkan20210914)
>         x2 <- DBI::dbGetQuery(conn=conn, statement=statement)
>         DBI::dbDisconnect(conn)
>
> On my system (E5-2603 v4), the first
> query took more than an hour, while the
> second took only a few minutes ...
>
> Do you guys know of any faster (but also
> nice) way to dig around in very large
> tsv files like
> https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz
> and
> https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
> ?
>
> Best,
> Rasmus
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list