[R] R, PostgresSQL and poor performance

James Cloos cloos at jhcloos.com
Wed Dec 14 00:24:43 CET 2011

>>>>> "BD" == Berry, David <dyb at noc.ac.uk> writes:

BD> All variables are reals other than id which is varchar(10) and date
BD> which is a timestamp, approximately 1.5 million rows are returned by
BD> the query and it takes order 10 second to execute using psql (the
BD> command line client for Postgres) and a similar time using pgAdmin
BD> 3. In R it takes several minutes to run and I'm unsure where the
BD> bottleneck is occurring.

You may want to test progressively smaller chunks of the data to see how
quickly R slows down as compared to psql on that query.

My first guess is that something allocating and re-allocating ram in a
quadratic (or worse) fashion.

I don't know whether OSX has anything equivilent, but you could test on
the linux box using oprofile (http://oprofile.sourceforge.net; SuSE
should have an rpm for it and kernel support compiled in) to confirm
where the time is spent.

It is /possible/ that the (sql)NULL->(r)NA logic in RS-PostgreSQL.c may
be slow (relatively speaking), but it is necessary.  Nothing else jumps
out as a possible choke point.

Oprofile (or the equivilent) would best answer the question.

James Cloos <cloos at jhcloos.com>         OpenPGP: 1024D/ED7DAEA6

More information about the R-help mailing list