[R] RSQLite slowness

Rasmus Liland jr@| @end|ng |rom po@teo@no
Thu Oct 7 16:27:19 CEST 2021


Dear Martin Morgan,

Thanks for all those links!  Yes, my 
question can be characterized like that 
I think, traditional way writing 
a temporary table into the database and 
left JOINing the others vs. 
parameterized query. 

A relevant example would be to first 
create the database from the compressed 
tsv files:

	for i in gene_info gene2refseq; do
	  wget -c https://ftp.ncbi.nlm.nih.gov/gene/DATA/$i.gz
	  gzip -d $i.gz
	  sqlite3 gene_info.sqlite ".mode tabs" ".import $i $i"
	  rm $i; done

then run this R code:

	H <- data.frame(Group = c(1, 1, 2, 2),
	  NM = c("NM_001267695", "NM_001007636",
	         "NM_001003706", "NM_001353612"))
	conn <- DBI::dbConnect(RSQLite::SQLite(), "gene_info.sqlite")
	DBI::dbWriteTable(conn, "H", H, overwrite=T)
	
	statement.1 <-
	 "SELECT * FROM gene2refseq R
	  LEFT JOIN gene_info I ON I.GeneID = R.GeneID
	  WHERE R.`RNA_nucleotide_accession.version`
	    LIKE '%' || ? || '%'"
	time.1 <- proc.time()
	x1 <- DBI::dbGetQuery(
	  conn=conn,
	  statement=statement.1,
	  param=list(H$NM))
	time.1 <- proc.time() - time.1
	
	statement.2 <-
	 "SELECT * FROM H
	  LEFT JOIN gene2refseq R ON
	    R.`RNA_nucleotide_accession.version`
	    LIKE '%' || H.NM || '%'
	  LEFT JOIN gene_info I ON I.GeneID = R.GeneID"
	time.2 <- proc.time()
	x2 <- DBI::dbGetQuery(
	  conn=conn,
	  statement=statement.2)
	time.2 <- proc.time() - time.2
	
	DBI::dbDisconnect(conn)
	
	saveRDS(object=x1, file="ex1_x1.rds", compress="xz")
	saveRDS(object=x2, file="ex1_x2.rds", compress="xz")
	saveRDS(object=list("Time x1"=list(time.1),
	                    "Time x2"=list(time.2)),
	        file="ex1_t.rds", compress="xz")

I got these timings in the ex1_t.rds 
file:

	$`Time x1`
	   user  system elapsed
	571.731 182.006 772.199
	
	$`Time x2`
	   user  system elapsed
	200.068  90.529 295.086

As you can see, statement.1 takes a lot 
longer to process compared to 
statement.2 ...  When I add the rest of 
the 31 search terms, the difference gets 
a lot bigger like I pointed out 
initially, beyond the full hour vs. only 
a few minutes. 

Best,
Rasmus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20211007/c25ba4ab/attachment.sig>


More information about the R-help mailing list