[R] database table merging tips with R

Avram Aelony aavram at mac.com
Fri Sep 12 00:08:06 CEST 2008


At some point I'd like to try to compile the DBI-based ROracle package as well.  For now though, I'll stick with RODBC as it seems to do what I need.  I believe RODBC is not based on DBI, but that shouldn't preclude using the pre-built subquery option which is a great idea and should work regardless.  I will try that.   It might also be a great way to implement the innards of the fictional code (for both RODBC and ROracle ) that references a column in the R data frame as I had suggested previously or a better variation on that...!

Thanks for your responses and insight.

Avram


 
On Thursday, September 11, 2008, at 02:19PM, "Coey Minear" <cminear at securecomputing.com> wrote:
>Avram Aelony writes:
> > 
> > I have not devoted time to setting up ROracle since binaries are
> > not available and it seems to require some effort to compile (see
> > http://cran.r-project.org/web/packages/ROracle/index.html).  On the
> > other hand, RODBC worked more or less magically once I set up the
> > data sources.
> > 
> > What is your success using ROracle and why would it be preferable
> > to RODBC ?
> > 
> > -Avram
> > 
>
>Actually, I've only been using RSQLite, so that's where my (limited)
>knowledge is coming from (and only for less than a week).  RSQLite
>based on DBI, and noticed that Aaron and I were answering with DBI
>functions.  I then noticed that you kept referring to "sqlQuery".
>That's why I raised the issue of whether RODBC is based on DBI or not.
>
>If RODBC is not based on DBI, then obviously you cannot leverage the
>use of dbWriteTable and dbRemoveTable, as they may not be present.  If
>RODBC offers something similar, you'll have to find out, but that may
>have driven your initial desire to avoid the temporary table option.
>
>Regardless of that, I don't see why you could not consider the
>pre-built subquery option, unless you find that ODBC, or RODBC, has a
>limit on what can be sent to the database.
>
>Coey
>
>
> >  
> > On Thursday, September 11, 2008, at 12:47PM, "Coey Minear" <cminear at securecomputing.com> wrote:
> > >Aaron Mackey writes:
> > > > I guess I'd do it something like this:
> > > > 
> > > > dbGetQuery(con, "CREATE TEMPORARY TABLE foo ( etc etc)")
> > > > sapply(@userids, function (x) { dbGetQuery(con, paste("INSERT INTO foo
> > > > (userid) VALUES (", x, ")")) })
> > > > 
> > > > then later:
> > > > 
> > > > dbGetQuery(con, "DROP TABLE foo");
> > > > 
> > >
> > >Actually, based on my reading of the DBI reference, you should be able
> > >to do the following to create a table (although possibly not temporary):
> > >
> > >  dbWriteTable(connection, "r_user_ids", r)
> > >
> > >Then you can use the following to drop the table:
> > >
> > >  dbRemoveTable(connection, "r_user_ids")
> > >
> > >Of course, I don't know whether the ODBC driver implements these
> > >functions or not.  (Is 'RODBC' built on DBI?  Looks like Aaron and I
> > >have been assuming that.)
> > >
> > >Coey
> > >
> > > > -Aaron
> > > > 
> > > > On Thu, Sep 11, 2008 at 3:21 PM, Avram Aelony <aavram at mac.com> wrote:
> > > > >
> > > > > Perhaps I will need to create a temp table, but I am asking if there is a way to avoid it.  It would be great if there were a way to tie the R data frame temporarily to the query in a transparent fashion. If not, I will see if I can create/drop the temp table directly from sqlQuery.
> > > > > -Avram
> > > > >
> > > > >
> > > > >
> > > > > On Thursday, September 11, 2008, at 12:07PM, "Aaron Mackey" <ajmackey at gmail.com> wrote:
> > > > >>Sorry, I see now you want to avoid this, but you did ask what was the
> > > > >>"best way to efficiently ...", and the temp. table solution certainly
> > > > >>matches your description.  What's wrong with using a temporary table?
> > > > >>
> > > > >>-Aaron
> > > > >>
> > > > >>On Thu, Sep 11, 2008 at 3:05 PM, Aaron Mackey <ajmackey at gmail.com> wrote:
> > > > >>> I would load your set of userid's into a temporary table in oracle,
> > > > >>> then join that table with the rest of your SQL query to get only the
> > > > >>> matching rows out.
> > > > >>>
> > > > >>> -Aaron
> > > > >>>
> > > > >>> On Thu, Sep 11, 2008 at 2:33 PM, Avram Aelony <aavram at mac.com> wrote:
> > > > >>>>
> > > > >>>> Dear R list,
> > > > >>>>
> > > > >>>> What is the best way to efficiently marry an R dataset with a very large (Oracle) database table?
> > > > >>>>
> > > > >>>> The goal is to only return Oracle table rows that match IDs present in the R dataset.
> > > > >>>> I have an R data frame with 2000 user IDs analogous to: r = data.frame(userid=round(runif(2000)*100000,0))
> > > > >>>>
> > > > >>>> ...and I need to pull data from an Oracle table only for these 2000 IDs.  The Oracle table is quite large. Additionally, the sql query may need to join to other tables to bring in ancillary fields.
> > > > >>>>
> > > > >>>> I currently connect to Oracle via odbc:
> > > > >>>>
> > > > >>>> library(RODBC)
> > > > >>>> connection <- odbcConnect("****", uid="****", pwd="****")
> > > > >>>> d = sqlQuery(connection, "select userid, x, y, z from largetable where timestamp > sysdate -7")
> > > > >>>>
> > > > >>>> ...allowing me to pull data from the database table into the R object "d" and then use the R merge function.  The problem however is that if "d" is too large it may fail due to memory limitations or be inefficient.  I would like to push the merge portion to the database and it would be very convenient if it were possible to request that the query look to the R object for the ID's to which it should restrict the output.
> > > > >>>>
> > > > >>>> Is there a way to do this?
> > > > >>>> Something like the following fictional code:
> > > > >>>> d = sqlQuery(connection, "select t.userid, x, y, z from largetable t where r$userid=t.userid")
> > > > >>>>
> > > > >>>> Would sqldf (http://code.google.com/p/sqldf/) help me out here? If so, how?   This would be convenient and help me avoid needing to create a temporary table to store the R data, join via sql, then return the data back to R.
> > > > >>>>
> > > > >>>> I am using R version 2.7.2 (2008-08-25) / i386-pc-mingw32 .
> > > > >>>> Thanks for your comments, ideas, recommendations.
> > > > >>>>
> > > > >>>>
> > > > >>>> -Avram
> > > > >>>>
> > > > >>>> ______________________________________________
> > > > >>>> R-help at r-project.org mailing list
> > > > >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > > > >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > >>>> and provide commented, minimal, self-contained, reproducible code.
> > > > >>>>
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > > 
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > > 
> > >
> > >
> > >
> > 
>
>
>



More information about the R-help mailing list