[R] dbAppy questions/clarifications

Ted Byers r.ted.byers at gmail.com
Wed Oct 15 21:34:21 CEST 2008

In the example in the documentation, I see:

rs <- dbSendQuery(con, 
             "select Agent, ip_addr, DATA from pseudo_data order by Agent")
out <- dbApply(rs, INDEX = "Agent", 
        FUN = function(x, grp) quantile(x$DATA, names=FALSE))

Maybe I am a bit thick, but it took me a while, and a kind hint from Phil,
to figure much of this out.

It is clear that the SQL orders the data by "Agent", and the INDEX parameter
tells dbApply that FUN is to be applied to each group of values defined by
Agent (like applying SUM(DATA) in SQL using a GROUP BY clause).  If my
understanding is correct, out will be an array holding ordered pairs, with
the value of Agent and the corresponding values returned by FUN.

I take it FUN = function(x, grp) quantile(x$DATA, names=FALSE) is the
function definition for a function called FUN.  I would guess, then, that
the opening and closing braces are optional.  Is that correct? Or is this
something else?  I did not see a definition of 'grp'.  What is it?

Suppose the function I want to apply is "fitdistr(x,"exponential")".  Would
I just replace "quantile(x$DATA, names=FALSE)" by

Finally, suppose the query I need to run is more complex, such as:

SELECT group_id,YEAR(my_date),WEEK(my_date),ndays FROM myTable ORDER BY

Can dbApply handle applying "fitdistr(x,"exponential")" to each group of
values defined by "group_id,YEAR(my_date),WEEK(my_date)"?  If so, how would
I change the call to dbsendQuery, and how would I insert the resulting
estimates using something like "INSERT INTO myResults
(group_id,year,week,rate,sd) VALUES (?,?,?,?);"?  

Once I get this, I can do everything else within a stored procedure in
MySQL.  I get the idea of using,e.g., sprintf to interpolate values I need
to insert into a query string, but it is a question of how to get the values
I need from 'out' (to use the above example), and how to iterate over them
to do the SQL INSERT.  

Actually, would 'dbWriteTable' handle inserting these values efficiently? 
If so, how do I ensure it maps the group_id,year, week, &c. from 'out' to
the right columns in my results table (what I have in mind involves a table
with a couple extra columns that would take appropriate default values)?


View this message in context: http://www.nabble.com/dbAppy-questions-clarifications-tp20000632p20000632.html
Sent from the R help mailing list archive at Nabble.com.

More information about the R-help mailing list