[R] retrieving large columns using RODBC

Tamas K Papp tpapp at Princeton.EDU
Sat Aug 13 10:03:14 CEST 2005


Hi,

I have a large table in Postgresql (result of an MCMC simulation, with 1
million rows) and I would like to retrive colums (correspond to variables)
using RODBC.  I have a column called "index" which is used to order rows.

Unfortunately, sqlQuery can't return all the values from a column at once
(RODBC complains about lack of memory).  So I am using the following code:

getcolumns <- function(channel, tablename, colnames, totalrows,
                      ordered=TRUE,chunksize=1e5) {
  r <- matrix(double(0),totalrows,length(colnames))
  for (i in 1:ceiling(totalrows/chunksize)) {
    cat(".")
    r[((i-1)*chunksize+1):(i*chunksize)] <- as.matrix(
      sqlQuery(channel, paste("SELECT", paste(colnames,collapse=", "),
                              "FROM", tablename,
                              "WHERE index <=", i*chunksize,
                              "AND index >", (i-1)*chunksize,
                              if (ordered) "ORDER BY index;" else ";")))
  }
  cat("\n")
  drop(r)                               # convert to vector if needed
}

to retrieve it in chunks.  However, this is very slow -- takes about 15
minutes on my machine.  Is there a way to speed it up?

I am running Linux on a powerbook, RODBC version 1.1-4, R 2.1.1.  The
machine has only 512 Mb of RAM.

Thanks,

Tamas




More information about the R-help mailing list