[R] translating SQL statements into data.table operations

Gabor Grothendieck ggrothendieck at gmail.com
Wed Mar 24 23:29:23 CET 2010


Note that, in general, you can speed up joins, even within sqldf, by
adding indexes to your tables and ensuring that your select statement
is written in such a way that the indexes are used.  See example 4i on
the sqldf home page.

On Wed, Mar 24, 2010 at 4:51 PM, Nick Switanek <nswitanek at gmail.com> wrote:
> I've recently stumbled across data.table, Matthew Dowle's package. I'm
> impressed by the speed of the package in handling operations with large
> data.frames, but am a bit overwhelmed with the syntax. I'd like to express
> the SQL statement below using data.table operations rather than sqldf (which
> was incredibly slow for a small subset of my financial data) or
> import/export with a DBMS, but I haven't been able to figure out how to do
> it. I would be grateful for your suggestions.
>
> nick
>
>
>
> My aim is to join events (trades) from two datasets ("edt" and "cdt") where,
> for the same stock, the events in one dataset occur between 15 and 75 days
> before the other, and within the same time window. I can only see how to
> express the "WHERE e.SYMBOL = c.SYMBOL" part in data.table syntax. I'm also
> at a loss at whether I can express the remainder using data.table's
> %between% operator or not.
>
> ctqm <- sqldf("SELECT e.*,
>                 c.DATE 'DATEctrl',
>                 c.TIME 'TIMEctrl',
>                 c.PRICE 'PRICEctrl',
>                 c.SIZE 'SIZEctrl'
>
>                 FROM edt e, ctq c
>
>                 WHERE e.SYMBOL = c.SYMBOL AND
>                       julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND
> 75 AND
>                       strftime('%H:%M:%S',c.TIME) BETWEEN
> strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME)")
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list