[R] Avoiding multiple outputs using RODBC package

Mark Wardle mark at wardle.org
Sun Dec 14 12:31:19 CET 2008


Hi.

You don't need to download the whole of the output database table to
look for an already generated answer. You can write a SQL query to do
that instead. ie. give me any rows with these parameters...  Get the
database to do the work - it is what they are designed to do.

So the procedure is:

1. Get input parameters
2. Query the output database to see whether analysis has already been
done (select * from output_table where...)
3. If not already done, do the calculation and insert result into output table

Note: you don't have to use sqlSave to save data. One can add single
rows by running arbitrary SQL.

Mark

2008/12/12 Brigid Mooney <bkmooney at gmail.com>:
> I am using R as a data manipulation tool for a SQL database.  So in some of
> my R scripts I use the RODBC package to retreive data, then run analysis,
> and use the sqlSave function in the RODBC package to store the results in a
> database.
>
> There are two problems I want to avoid, and they are highly related: (1)
> having R rerun analysis which has already been done and saved into output
> database table, and (2) ending up with more than one identical row in
> my output database table.
>
> -------------------------------------
> The analysis I am running allows the user to input a large number of
> variables, for example:
> date, version, a, b, c, d, e, f, g, ...
>
> After R completes its analysis, I write the results to a database table in
> the format:
> Value, date, version, a, b, c, d, e, f, g, ...
>
> where Value is the result of the R analysis, and the rest of the columns are
> the criteria that was used to get that value.
> --------------------------------------
>
> Can anyone think of a way to address these problems?  The only thing I can
> think of so far is to run an sqlQuery to get a table of all the variable
> combinations that are saved at the start, and then simply avoid computing
> and re-outputing those results.  However, my results database table
> currently has over 200K rows (and will grow very quickly as I keep going
> with this project), so I think that would not be the most expeditious answer
> as I think just the SQL query to download 200K rows x 10+ columns is going
> to be time consuming in and of itself.
>
> I know this is kindof a weird problem, and am open to all sorts of ideas...
>
> Thanks!
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>



-- 
Dr. Mark Wardle
Specialist registrar, Neurology
Cardiff, UK



More information about the R-help mailing list