[R] SQL-select using native R methods ?

Gabor Grothendieck ggrothendieck at gmail.com
Mon Mar 22 12:34:28 CET 2010


On Mon, Mar 22, 2010 at 6:37 AM,  <Jan.Sunde at biokapital.no> wrote:
> Hi
> I have a problem in R that I have been trying to solve but without
> success.
> I am trying to join two tables on two variables : an ID and a date
> (optional) that will be common between the two tables
>
> In SQL (and SAS PROC SQL) I am a frequent user of the "select" command and
> I am used to the following nomenclature :
>
> select a.*, b.c, b.y, b.z from table1 a, table2 b where a.date=b.date and
> a.id=b.id
>
> I tried this in R (using sqldf) but it takes waaaay too long to get the
> result. My data sets are >1 gb each.

Try adding indexes to the join columns.  I was able to do a join
between two 1 million row tables in 13 seconds (under a minute
including the time to add the indexes) on a laptop (not a particularly
fast machine).

> set.seed(1)
> n <- 1000000
> DF1 <- data.frame(a = sample(n, n), b1 = runif(n))
> DF2 <- data.frame(a = sample(n, n), b2 = runif(n))
> library(sqldf)
>
> sqldf()
<SQLiteConnection:(2708,1)>
> system.time(sqldf("create index ai1 on DF1(a)"))
   user  system elapsed
  16.63    0.09   16.85
> system.time(sqldf("create index ai2 on DF2(a)"))
   user  system elapsed
  16.06    0.04   16.30
> system.time(sqldf("select * from main.DF1 natural join main.DF2"))
   user  system elapsed
  12.72    0.21   12.93
> sqldf()


> Is there any way this can be done by merge() or any other more "R-like"
> way ?
> I have tried making a compound variable with paste(id, as.character(date),
> sep="") and merge() on that but since the date more often than not is NA
> this doesnt seem to work. I'm a bit stuck and the documentation is of
> little help to me in this case.
> If anybody would be kind to point me in the right direction I would be
> grateful.
>
> Jan
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list