[R] retrieving large columns using RODBC

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Aug 15 18:51:45 CEST 2005


On Mon, 15 Aug 2005, bogdan romocea wrote:

> This appears to be an SQL issue. Look for a way to speed up your
> queries in Postgresql. I presume you haven't created an index on
> 'index', which means that every time you run your SELECT, Postgresql
> is forced to do a full table scan (not good). If the index doesn't
> solve the problem, look for some SQL help.

If that were the case the fact that sqlQuery is not being used properly 
(it can do the query and return the results in blocks) is likely to be 
the problem.  But then we do ask people to read the help page before 
posting.

>
>
>> -----Original Message-----
>> From: Tamas K Papp [mailto:tpapp at princeton.edu]
>> Sent: Saturday, August 13, 2005 4:03 AM
>> To: R-help mailing list
>> Subject: [R] retrieving large columns using RODBC
>>
>>
>> Hi,
>>
>> I have a large table in Postgresql (result of an MCMC
>> simulation, with 1
>> million rows) and I would like to retrive colums (correspond
>> to variables)
>> using RODBC.  I have a column called "index" which is used to
>> order rows.
>>
>> Unfortunately, sqlQuery can't return all the values from a
>> column at once
>> (RODBC complains about lack of memory).  So I am using the
>> following code:
>>
>> getcolumns <- function(channel, tablename, colnames, totalrows,
>>                       ordered=TRUE,chunksize=1e5) {
>>   r <- matrix(double(0),totalrows,length(colnames))
>>   for (i in 1:ceiling(totalrows/chunksize)) {
>>     cat(".")
>>     r[((i-1)*chunksize+1):(i*chunksize)] <- as.matrix(
>>       sqlQuery(channel, paste("SELECT", paste(colnames,collapse=", "),
>>                               "FROM", tablename,
>>                               "WHERE index <=", i*chunksize,
>>                               "AND index >", (i-1)*chunksize,
>>                               if (ordered) "ORDER BY index;"
>> else ";")))
>>   }
>>   cat("\n")
>>   drop(r)                               # convert to vector if needed
>> }
>>
>> to retrieve it in chunks.  However, this is very slow --
>> takes about 15
>> minutes on my machine.  Is there a way to speed it up?
>>
>> I am running Linux on a powerbook, RODBC version 1.1-4, R 2.1.1.  The
>> machine has only 512 Mb of RAM.
>>
>> Thanks,
>>
>> Tamas
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list