[BioC] dbWriteTable Failure

Tomoaki NISHIYAMA tomoakin at kenroku.kanazawa-u.ac.jp
Sat Jan 15 07:21:11 CET 2011


Hi,

> | > |     have tried RPostgreSQL, but it has an even worse problem.  
> In that
> | > |     package, dbWriteTable is implemented by calling  
> write.table on the data
> | > |     frame to write it to a temporary file, and then load it  
> into the
> | > |     database,


Yes, this is still true.
The original code was to write to a temporary file and let the  
*PostgreSQL server* read the file.
Since this is impossible if the file system differ or the privilege  
is not coherent,
my update was to read that file again, which is always possible, and  
send the content to
the server via the network connection, be it unix domain socket or  
TCP/IP.
So, the current code inherits the effect of write.table, as before,
if there were any problem in formating.

However, I am not sure in the following:

> Specifically, write.table by default formats some numbers in
> | > |     scientific notation, which causes the database load to fail.

Postgresql should be able to read numbers in scientific notation,
unless it is outside the supported range.
So, there could be still other problems, in the load failure.

One way to debug may remove the code to remove the temporary file,
so that you can check what is exactly to be sent.
If the file is good, it can be loaded to the postgresql with
psql using
\copy tablename from 'filename'
This should give the same results, but you can investigate what  
modification
allow proper loading.

As for the code fragment shown in the mail,
> | > |     >>>   dbSendQuery(conn, "create temp table temp (x int, y  
> int)")


I would avoid using "temp table".
The db drivers seems to want to be free to create independent  
connections to
the database and not guarantee transaction over a single transaction.
So temp table may not be visible from the next query which may be via an
independent connection.

See issue 8
http://code.google.com/p/rpostgresql/issues/detail?id=8
-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan


On 2011/01/15, at 4:54, Dirk Eddelbuettel wrote:

>
> Elliot,
>
> On 14 January 2011 at 14:35, Elliot Joel Bernstein wrote:
> | Dirk -
> |
> | Thank you for following up on this. It's possible I'm just  
> misunderstanding the code, but I grabbed the source for RPostgreSQL  
> ver. 0.1-7, and as far as I can tell, the work for dbWriteTable (in  
> postgresqlWriteTable) still works by creating a temporary file.  
> (Lines 649-660 of PostgreSQLSupport.R.)
>
> Hm. Looking at the code, I see that you have a point there.  Maybe  
> Tomoaki
> can clarify.
>
> We did move a couple of bugs out of the way and I was under the  
> impression
> that this had been taken care of too.  If not, well, you always  
> have your
> money back guarantee.... ;-)
>
> And in case that wasn't patently obvious, we *do* welcome welcome  
> patches.
>
> Cheers, Dirk
>
> |
> | - Elliot
> |
> | On Fri, Jan 14, 2011 at 11:56:47AM -0600, Dirk Eddelbuettel wrote:
> | >
> | > On 14 January 2011 at 12:46, Sean Davis wrote:
> | > |
> | > |
> | > | On Fri, Jan 14, 2011 at 12:36 PM, Martin Morgan  
> <mtmorgan at fhcrc.org> wrote:
> | > |
> | > |     On 01/14/2011 06:25 AM, Elliot Joel Bernstein wrote:
> | > |     > Martin -
> | > |     >
> | > |     > Thanks for your reply. Is RdbiPgSQL no longer  
> maintained at all? I
> | > |
> | > |     Hi Elliot --
> | > |
> | > |     (sorry for the last post, oops). The package hasn't had  
> any substantial
> | > |     change since 2006. I'm cc'ing the currently listed  
> maintainer, and also
> | > |     the maintainer of RPostgreSQL, who might have additional  
> insights.
> | > |
> | > |
> | > |
> | > | Perhaps Dirk will comment directly but, with regards to  
> RPosgreSQL, there is a
> | > | google code site and an issue for dbWriteTable was already  
> filed.
> | > |
> | > | http://code.google.com/p/rpostgresql/issues/detail?id=4&can=1
> | > |
> | > | The resolution is that Dirk would appreciate a patch (which I  
> have not been
> | > | able to supply--someone else?).
> | >
> | > Did the OP try a current version?   NEWS follows below.   
> Tomoaki has done
> | > some wonderful work and we are in much better shape than we  
> were, say, one
> | > year ago.
> | >
> | > That said, there are always open issues.  See the Google Code  
> site; and I cc
> | > out dev list.
> | >
> | > Dirk
> | >
> | > Version 0.1-7 -- 2010-10-17
> | >
> | >     o   Several potential buffer overruns were fixed
> | >
> | >     o   dbWriteTable now writes a data.frame to database  
> through a network
> | >         connection rather than a temporary file. Note that  
> row_names may be
> | >         changed in future releases.  Also, passing in filenames  
> instead of
> | >         data.frame is not supported at this time.
> | >
> | >     o   When no host is specified, a connection to the  
> PostgreSQL server
> | >         is made via UNIX domain socket (just like psql does)
> | >
> | >     o   Table and column names are case sensitive, and  
> identifiers are escaped
> | >         or quoted appropriately, so that any form of table/ 
> column names can be
> | >         created, searched, or removed, including upper-, lower-  
> and mixed-case.
> | >
> | >     o   nullOk in dbColumnInfo has a return value of NA when  
> the column does
> | >         not correspond to a column in the table. The utility of  
> nullOk is
> | >         doubtful but not removed at this time.
> | >
> | >     o   Correct Windows getpid() declaration (with thanks to  
> Brian D. Ripley)
> | >
> | >     o   A call of as.POSIXct() with a time format string  
> wrongly passed to TZ
> | >         has been corrected; this should help with intra-day  
> timestamps (with
> | >         thanks to Steve Eick)
> | >
> | >     o   Usage of tmpdir has been improved on similarly to Linux  
> (with thanks
> | >         to Robert McGehee)
> | >
> | > Dirk
> | >
> | >
> | > |
> | > | Sean
> | > |
> | > |
> | > |
> | > |     Martin
> | > |
> | > |     have tried RPostgreSQL, but it has an even worse problem.  
> In that
> | > |     package, dbWriteTable is implemented by calling  
> write.table on the data
> | > |     frame to write it to a temporary file, and then load it  
> into the
> | > |     database, which means that the formatting options set in  
> R affect the
> | > |     result. Specifically, write.table by default formats some  
> numbers in
> | > |     scientific notation, which causes the database load to fail.
> | > |     >
> | > |     > After looking into the problem with RdbiPgSQL a little  
> more, it
> | > |     > seems
> | > |     that it's caused by dbDisconnect calling the same  
> function that is
> | > |     registered as the finalizer. I think it could be easily  
> solved by using
> | > |     an indicator variable to keep track of whether PQfinish  
> has been called.
> | > |     (I can also just call rm instead of dbDisconnect, but  
> it's a little
> | > |     disconcerting that the package contains a function that  
> causes memory
> | > |     errors when used as intended.)
> | > |
> | > |     > - Elliot
> | > |     >
> | > |     >
> | > |     > On Tue, Jan 11, 2011 at 08:12:21PM -0800, Martin Morgan  
> wrote:
> | > |     >> On 01/11/2011 11:37 AM, Elliot Joel Bernstein wrote:
> | > |     >>> I am having a problem using the dbWriteTable function  
> in package
> | > |     RdbiPgSQL to insert data into a PostgreSQL table.  
> Specifically, the
> | > |     following code sometimes works, but sometimes fails with  
> an error message
> | > |     indicating that there is "no database connection":
> | > |     >>>
> | > |     >>>
> | > |     >>>
> | > |     >>>  
> =====================================================================
> | > |     >>> dbtest.R
> | > |     >>>  
> =====================================================================
> | > |     >>>
> | > |     >>> require(RdbiPgSQL)
> | > |     >>
> | > |     >> Hi Elliot -- not so much an answer, but unless  
> RdbiPgSQL is central you
> | > |     >> might try the CRAN package RPostgreSQL, which is  
> actively maintained.
> | > |     >>
> | > |     >> Martin
> | > |     >>
> | > |     >>>
> | > |     >>> test <- function(conn, n) {
> | > |     >>>
> | > |     >>>   dbSendQuery(conn, "create temp table temp (x int, y  
> int)")
> | > |     >>>
> | > |     >>>   n <- 10000
> | > |     >>>   dat <- data.frame(x=sample(1000,n,replace=T),  
> y=sample(1000,n,replace
> | > |     =T))
> | > |     >>>
> | > |     >>>   for (i in 1:dim(dat)[1]) {
> | > |     >>>
> | > |     >>>     err <- try(dbSendQuery(conn, sprintf("insert into  
> temp values
> | > |     (%d,%d)", dat$x[i], dat$y[i])))
> | > |     >>>     if (inherits(err, "try-error")) {
> | > |     >>>       print(conn)
> | > |     >>>       break
> | > |     >>>     }
> | > |     >>>
> | > |     >>>   }
> | > |     >>>
> | > |     >>>   print(dbGetQuery(conn, "select count(1) from temp"))
> | > |     >>>
> | > |     >>> }
> | > |     >>>
> | > |     >>> conn <- dbConnect(PgSQL(), dbname="gf", host="gf-host")
> | > |     >>> test(conn, 10000)
> | > |     >>> dbDisconnect(conn)
> | > |     >>>
> | > |     >>>  
> =====================================================================
> | > |     >>>
> | > |     >>>
> | > |     >>>
> | > |     >>> The following is sample output from multiple runs of  
> this script:
> | > |     >>>
> | > |     >>>
> | > |     >>>
> | > |     >>>  
> =====================================================================
> | > |     >>> Sample output
> | > |     >>>  
> =====================================================================
> | > |     >>>
> | > |     >>> R version 2.12.0 (2010-10-15)
> | > |     >>> Copyright (C) 2010 The R Foundation for Statistical  
> Computing
> | > |     >>> ISBN 3-900051-07-0
> | > |     >>> Platform: x86_64-redhat-linux-gnu (64-bit)
> | > |     >>>
> | > |     >>> R is free software and comes with ABSOLUTELY NO  
> WARRANTY.
> | > |     >>> You are welcome to redistribute it under certain  
> conditions.
> | > |     >>> Type 'license()' or 'licence()' for distribution  
> details.
> | > |     >>>
> | > |     >>>   Natural language support but running in an English  
> locale
> | > |     >>>
> | > |     >>> R is a collaborative project with many contributors.
> | > |     >>> Type 'contributors()' for more information and
> | > |     >>> 'citation()' on how to cite R or R packages in  
> publications.
> | > |     >>>
> | > |     >>> Type 'demo()' for some demos, 'help()' for on-line  
> help, or
> | > |     >>> 'help.start()' for an HTML browser interface to help.
> | > |     >>> Type 'q()' to quit R.
> | > |     >>>
> | > |     >>>> source("dbtest.R")
> | > |     >>> Loading required package: RdbiPgSQL
> | > |     >>> Loading required package: Rdbi
> | > |     >>>   count
> | > |     >>> 1 10000
> | > |     >>>> source("dbtest.R")
> | > |     >>> Error in dbSendQuery.PgSQL.conn(conn, sprintf("insert  
> into temp values
> | > |     (%d,%d)",  :
> | > |     >>>   No database connection
> | > |     >>> status = 1
> | > |     >>> database.name =  a
> | > |     >>> host.name = p��
> | > |     >>> options = 8}G
> | > |     >>> password = P�
> | > |     >>> port = �
> | > |     >>> tty = ��
> | > |     >>> socket = -1
> | > |     >>> client.encoding = -1
> | > |     >>> backend.pid = 0
> | > |     >>> Error in dbSendQuery.PgSQL.conn(conn, ...) : No  
> database connection
> | > |     >>>
> | > |     >>>  
> =====================================================================
> | > |     >>>
> | > |     >>>
> | > |     >>>
> | > |     >>> Here is the result of a traceback:
> | > |     >>>
> | > |     >>>
> | > |     >>>
> | > |     >>>  
> =====================================================================
> | > |     >>> traceback
> | > |     >>>  
> =====================================================================
> | > |     >>>
> | > |     >>>> traceback()
> | > |     >>> 11: .Call("PgSQLsendQuery", conn, query, PACKAGE =  
> "RdbiPgSQL")
> | > |     >>> 10: dbSendQuery.PgSQL.conn(conn, ...) at dbtest.R#21
> | > |     >>> 9: dbSendQuery(conn, ...) at dbtest.R#21
> | > |     >>> 8: dbGetResult(dbSendQuery(conn, ...)) at dbtest.R#21
> | > |     >>> 7: dbGetQuery.PgSQL.conn(conn, "select count(1) from  
> temp") at dbtest.R
> | > |     #21
> | > |     >>> 6: dbGetQuery(conn, "select count(1) from temp") at  
> dbtest.R#21
> | > |     >>> 5: print(dbGetQuery(conn, "select count(1) from  
> temp")) at dbtest.R#21
> | > |     >>> 4: test(conn, 10000)
> | > |     >>> 3: eval.with.vis(expr, envir, enclos)
> | > |     >>> 2: eval.with.vis(ei, envir)
> | > |     >>> 1: source("dbtest.R")
> | > |     >>>
> | > |     >>>  
> =====================================================================
> | > |     >>>
> | > |     >>>
> | > |     >>>
> | > |     >>> And here is my sessionInfo:
> | > |     >>>
> | > |     >>>
> | > |     >>>
> | > |     >>>  
> =====================================================================
> | > |     >>> sessionInfo
> | > |     >>>  
> =====================================================================
> | > |     >>>
> | > |     >>>> sessionInfo()
> | > |     >>> R version 2.12.0 (2010-10-15)
> | > |     >>> Platform: x86_64-redhat-linux-gnu (64-bit)
> | > |     >>>
> | > |     >>> locale:
> | > |     >>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> | > |     >>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> | > |     >>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> | > |     >>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> | > |     >>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> | > |     >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> | > |     >>>
> | > |     >>> attached base packages:
> | > |     >>> [1] stats     graphics  grDevices utils     datasets   
> methods   base
> | > |     >>>
> | > |     >>> other attached packages:
> | > |     >>> [1] RdbiPgSQL_1.24.0 Rdbi_1.24.0
> | > |     >>>
> | > |     >>>  
> =====================================================================
> | > |     >>>
> | > |     >>>
> | > |     >>>
> | > |     >>> I am using version 8.4.4 of the PostgreSQL-devel  
> libraries, and the
> | > |     server is running version 8.4.1. The problem does not  
> always occur on the
> | > |     second run; I am sometimes able to run the script  
> multiple times before it
> | > |     eventually fails. Also, it does not always fail at the  
> same row of the
> | > |     dataframe, nor is there anything obviously wrong with the  
> data values on
> | > |     the row where it fails. I would appreciate any help in  
> diagnosing the
> | > |     problem.
> | > |     >>>
> | > |     >>> Thank you very much.
> | > |     >>>
> | > |     >>> - Elliot
> | > |     >>>
> | > |     >>> _______________________________________________
> | > |     >>> Bioconductor mailing list
> | > |     >>> Bioconductor at r-project.org
> | > |     >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> | > |     >>> Search the archives: http://news.gmane.org/
> | > |     gmane.science.biology.informatics.conductor
> | > |     >>
> | > |     >>
> | > |     >> --
> | > |     >> Computational Biology
> | > |     >> Fred Hutchinson Cancer Research Center
> | > |     >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> | > |     >>
> | > |     >> Location: M1-B861
> | > |     >> Telephone: 206 667-2793
> | > |
> | > |
> | > |     --
> | > |     Computational Biology
> | > |     Fred Hutchinson Cancer Research Center
> | > |     1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> | > |
> | > |     Location: M1-B861
> | > |     Telephone: 206 667-2793
> | > |
> | > |     _______________________________________________
> | > |     Bioconductor mailing list
> | > |     Bioconductor at r-project.org
> | > |     https://stat.ethz.ch/mailman/listinfo/bioconductor
> | > |     Search the archives: http://news.gmane.org/
> | > |     gmane.science.biology.informatics.conductor
> | > |
> | > |
> | >
> | > --
> | > Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>
> -- 
> Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
>



More information about the Bioconductor mailing list