[R] parLapply on sqlQuery (from package RODBC)

Frede Aakmann Tøgersen frtog at vestas.com
Thu Jul 3 09:51:06 CEST 2014


Hi

Why are you doing duplicate queries to the database (two As and Cs in your names vector)?

Why do 5 simultaneously connection to the database server? Woukld you do 500 connections?

Why not do one query and let the database server do the job for you?

Try this:

> options(useFancyQuotes = FALSE)
> 
> query0 <- "select id from table where name in (%s)" 
> 
> names <- paste(sQuote(LETTERS[1:5]), collapse = ",")
> names
[1] "'A','B','C','D','E'"
> 
> query <- sprintf(query0, names)
> query
[1] "select id from table where name in ('A','B','C','D','E')"
> 
> dataFromDB <- sqlQuery(dbConn, query)

This should work for MS SQL and MySQL servers.

Yours sincerely / Med venlig hilsen


Frede Aakmann Tøgersen
Specialist, M.Sc., Ph.D.
Plant Performance & Modeling

Technology & Service Solutions
T +45 9730 5135
M +45 2547 6050
frtog at vestas.com
http://www.vestas.com

Company reg. name: Vestas Wind Systems A/S
This e-mail is subject to our e-mail disclaimer statement.
Please refer to www.vestas.com/legal/notice
If you have received this e-mail in error please contact the sender. 


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Tevlin, Dylan
> Sent: 2. juli 2014 23:49
> To: r-help at r-project.org
> Subject: [R] parLapply on sqlQuery (from package RODBC)
> 
> R Version : 2.14.1 x64
> Running on Windows 7
> 
> Connecting to a database on a remote Microsoft SQL Server 2012
> 
> The short form of my problem is the following.
> 
> I have an unordered vectors of names, say:
> 
> names<-c("A", "B", "A", "C","C")
> 
> each of which have an id in a table in my db.  I need to convert the names to
> their corresponding ids.
> 
> I currently have the following code to do it.
> ###
> names<-c("A", "B", "A", "C","C")
> dbConn<-odbcDriverConnect(connection="connection string") #successfully
> connects
> 
> nameToID<-function(name, dbConn){
>                 #dbConn : active db connection formed via odbcDriverConnect
>                 #name     : a char string
> 
>                 sqlQuery(dbConn, paste("select id from table where name='", name,
> "'", sep=""))
> }
> sapply(names, nameToID, dbConn=dbConn)
> ###
> 
> Barring better ways to do this, which could involve loading the table into R
> then working with the problem there (which is possible), I understand why
> the following doesn't work, but I cannot seem to find a solution.  Attempting
> to use parallelization via the package 'parallel' :
> 
> ###
> names<-c("A", "B", "A", "C","C")
> dbConn<-odbcDriverConnect(connection="connection string") #successfully
> connects
> 
> nameToID<-function(name, dbConn){
>                 #dbConn : active db connection formed via odbcDriverConnect
>                 #name     : a char string
> 
>                 sqlQuery(dbConn, paste("select id from table where name='", name,
> "'", sep=""))
> }
> 
> mc<-detectCores()
> cl<-makeCluster(mc)
> clusterExport(cl, c("sqlQuery", "dbConn"))
> parSapply(cl, names, nameToID, dbConn=dbConn)    #incorrect passing of
> nameToID's second argument
> ###
> 
> As in the comment, this is not the correct way to assign the second argument
> to nameToID.
> 
> I have also tried the following:
> 
> parSapply(cl, names, function(x) nameToID(x, dbConn))
> 
> in place of the previous parSapply call, but that also does not work, with the
> error being thrown saying "the first parameter is not an open RODBC
> connection", presumably referring to the first parameter of the sqlQuery()
> 
> The following code does work with parallization.
> 
> ###
> names<-c("A", "B", "A", "C","C")
> dbConn<-odbcDriverConnect(connection="connection string") #successfully
> connects
> nameToID<-function(name){
>                 #name     : a char string
>                 dbConn<-odbcDriverConnect(connection="string")
>                 result<-sqlQuery(dbConn, paste("select id from table where
> name='", name, "'", sep=""))
>                 odbcClose(dbConn)
>                 result
> }
> 
> mc<-detectCores()
> cl<-makeCluster(mc)
> clusterExport(cl, c("sqlQuery", "odbcDriverConnect", "odbcClose", "dbConn",
> "nameToID"))      #throwing everything in
> parSapply(cl, names, nameToID)
> ###
> 
> But the constant opening and closing a ton of the gains from parallelization,
> and seems just a bit silly.
> 
> So the overall question would be how to pass the second parameter (the
> open db connection) to the function within parSapply, in much the same way
> as it is done in the regular apply?  In general, how does one pass a second,
> third, nth parameter to a function within a parallel routine?
> 
> Thanks and if you need any more information let me know.
> 
> -DT
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list