[BioC] GOstats locked database

Janet Young jayoung at fhcrc.org
Wed Jan 23 23:29:36 CET 2008


Hi all,

I'm having some trouble with a locked database with GOstats, perhaps  
due to running multiple simultaneous processes that are all accessing  
GO.db?

I'm using R CMD BATCH to run an R script I wrote, and I'm doing that  
simultaneously from 12 different terminal windows, each logged in to  
a single node of a linux cluster. Some processes may be sharing a  
node (2 CPU per node). I'm happy to send the entire script, if that's  
useful, but for now there are just some snippets. Here's the basic  
problem:

 > params <- new("GOHyperGParams", geneIds = geneentrezIDs,  
universeGeneIds = allgeneentrezIDs, ontology="BP",  
annotation="org.Hs.eg.db",pvalueCutoff=hgCutoff, conditional=FALSE,  
testDirection = "over")
 > thishgOver<-hyperGTest(params)
Error in sqliteFetch(rs, n = -1, ...) :
   RSQLite driver: (RS_SQLite_fetch: failed first step: database is  
locked)
Calls: hyperGTest ... dbGetQuery -> sqliteQuickSQL -> sqliteFetch - 
 > .Call
Execution halted

It's a very sporadic problem - I'm actually using the script to loop  
through a bunch of simulated datasets and run hyperGTest - it does  
fine for a while and then suddenly has a problem. I can't be sure,  
but it seems like several of the processes I was running  
simultaneously all had a problem around the same time (which wouldn't  
be surprising if something suddenly happened to the database).

It's also possible that our linux nodes are having some intermittent  
connectivity issues to the mounted drives - could that cause the  
database locked error? If so would there be a way to make hyperGTest  
robust to a temporary problem like that?

As well as hyperGTest, the script also accesses GO information using  
the following commands at various points, with commands like these:
 > Term(get(names(genes)[b],GOTERM))
 > geneentrezIDs <- geneentrezIDs[!is.na(mget 
(geneentrezIDs,envir=org.Hs.egGO,ifnotfound=NA))]
I was running a very similar version of the script last week, with no  
problem, and I think the above two commands are the only things I've  
added that might be accessing the GO data. I'm not clear on which of  
these commands use the same database as one another: (a) mget from  
org.Hs.egGO (b) hyperGTest with annotation="org.Hs.eg.db", (c) get  
from GOTERM.

Here is the output of sessionInfo(), run just before I started  
looping through the datasets, so several iterations of the mget from  
org.Hs.egGO and the hyperGTest have happened after running this  
sessionInfo, but I think all relevant libraries were loaded. (is  
there a way to make R output sessionInfo immediately before it  
terminates with error, when running in batch mode?)

 > sessionInfo()
R version 2.6.1 Patched (2007-12-02 r43572)
i686-pc-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U 
TF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF- 
8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_ID 
ENTIFICATION=C

attached base packages:
[1] splines   tools     stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
  [1] org.Hs.eg.db_2.0.2  GOstats_2.4.0       Category_2.4.0
  [4] genefilter_1.16.0   survival_2.34       RBGL_1.14.0
  [7] annotate_1.16.1     xtable_1.5-2        GO.db_2.0.2
[10] AnnotationDbi_1.0.6 RSQLite_0.6-4       DBI_0.2-4
[13] Biobase_1.16.2      graph_1.16.1

loaded via a namespace (and not attached):
[1] cluster_1.11.9


And here's some other, possibly pertinent information:
[12] kpvpt50:/home/jayoung/traskdata/janet/forOthers/forIlona/ 
GOanalysis/doGOmoreregions_slightly_better_again/DCLoss_10percent>   
ls -l ~/traskdata/lib_linux/R/library/GO.db/extdata/
total 37364
-rw-r--r--  1 jayoung trasklab 38252544 Dec  3 13:55 GO.sqlite
So I can write to GO.sqlite. Should it be read-only, to myself? Will  
that mess me up if I want to over-write it in future?
[93] bedrock:/home/jayoung/traskdata/janet/forOthers/forIlona/ 
GOanalysis/doGOmoreregions_slightly_better_again> ls -l ~/traskdata/ 
lib_linux/R/library/org.Hs.eg.db/extdata/
total 187130
-rw-r--r--   1 jayoung  trasklab 95802368 Dec 13 14:50 org.Hs.eg.sqlite


Thanks for any advice - this is a tricky one as it happens sometime  
in the middle of a ~12 hour run, and is not necessarily reproducible.  
Hopefully I've provided enough information here to track down the  
problem.

Janet

-------------------------------------------------------------------

Dr. Janet Young (Trask lab)

Fred Hutchinson Cancer Research Center
1100 Fairview Avenue N., C3-168,
P.O. Box 19024, Seattle, WA 98109-1024, USA.

tel: (206) 667 1471 fax: (206) 667 6524
email: jayoung at fhcrc.org

http://www.fhcrc.org/labs/trask/



More information about the Bioconductor mailing list