[R] Reading SQL data - all at once, or "as needed"?

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Tue Jul 15 17:01:10 CEST 2014


The actual answer (which way is faster) in general can go either way depending how much data you extract and how well optimized the SQL queries and indexes are. The amount of data you mention fit one or give weeks seems small for most modern computers, though (you did not mention how many columns but it would have to be a lot to change the story).

I would go for all five weeks at once, and only break it up if I had problems.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On July 15, 2014 7:38:49 AM PDT, John McKown <john.archie.mckown at gmail.com> wrote:
>I have some data in an SQL data base (PostgreSQL to be exact). This
>data
>base (one table in particular) has a lot of data in it. Years worth, in
>fact (2.3 million rows since 2012). That just for background.
>
>My question is: If I am going to create 5 separate graphs, one graph
>for
>each of the previous 5 weeks (in this case a week is Sunday to
>Saturday),
>is it better to read in all 5 weeks worth of data in a single
>dbGetQuery
>where the SELECT has a WHERE clause which will get the proper 5 weeks
>worth
>of data, then subset in R. Or is it better to get a single weeks worth
>of
>data in a dbGetQuery, with the proper SELECT ... WHERE. And then
>process
>each week. There seems to be anywhere from 7,500 to 9,000 entries for a
>single week. The calculations for each week are independent of any
>other
>week's data. Basically I am just creating a simple bar chart.
>
>In the first case, I do one SELECT; and then subset the data.frame data
>in
>the for() loop. In the second case, I still use a for() loop, but I do
>a
>SELECT in each iteration, but don't need to subset the data.frame.
>
>I have read the "Data Import/Export". The only advice I can find is
>based
>on:
><quote>
>The sort of statistical applications for which DBMS might be used are
>to
>extract a 10% sample of the data, to cross-tabulate data to produce a
>multi-dimensional contingency table, and to extract data group by group
>from a database for separate analysis.
></quote>
>
>The "extract data group by group ..." seems, to me, to say that I
>should
>extract and process each week's data separately, using a dbGetQuery in
>my
>loop. Am I interpreting this correctly?
>
>Thanks for your advice.



More information about the R-help mailing list