[R] translating SQL statements into data.table operations

Matthew Dowle mdowle at mdowle.plus.com
Thu Mar 25 13:33:23 CET 2010


Nick,

Good question,  but just sent to the wrong place. The posting guide asks you 
to contact the package maintainer first before posting to r-help only if you 
don't hear back. I guess one reason for that is that if questions about all 
2000+ packages were sent to r-help, then r-help's traffic could go through 
the roof.  Another reason could be that some (i.e. maybe many, maybe few) 
package maintainers don't actually monitor r-help and might miss any 
messages you post here.  I only saw this one thanks to google alerts.

Since I'm writing anyway ... are you using the latest version on r-forge 
which has the very fast grouping? Have you set multi-column keys on both edt 
and cdt and tried edt[cdt,roll=TRUE] syntax ?  We'll help you off list to 
climb the learning curve quickly. We are working on FAQs and a vignette and 
they should be ready soon too.

Please do follow up with us (myself and Tom Short cc'd are the main 
developers) off list and one of us will be happy to help further.

Matthew


"Nick Switanek" <nswitanek at gmail.com> wrote in message 
news:772ec1011003241351v6a3f36efqb0b0787564691f3a at mail.gmail.com...
> I've recently stumbled across data.table, Matthew Dowle's package. I'm
> impressed by the speed of the package in handling operations with large
> data.frames, but am a bit overwhelmed with the syntax. I'd like to express
> the SQL statement below using data.table operations rather than sqldf 
> (which
> was incredibly slow for a small subset of my financial data) or
> import/export with a DBMS, but I haven't been able to figure out how to do
> it. I would be grateful for your suggestions.
>
> nick
>
>
>
> My aim is to join events (trades) from two datasets ("edt" and "cdt") 
> where,
> for the same stock, the events in one dataset occur between 15 and 75 days
> before the other, and within the same time window. I can only see how to
> express the "WHERE e.SYMBOL = c.SYMBOL" part in data.table syntax. I'm 
> also
> at a loss at whether I can express the remainder using data.table's
> %between% operator or not.
>
> ctqm <- sqldf("SELECT e.*,
>                 c.DATE 'DATEctrl',
>                 c.TIME 'TIMEctrl',
>                 c.PRICE 'PRICEctrl',
>                 c.SIZE 'SIZEctrl'
>
>                 FROM edt e, ctq c
>
>                 WHERE e.SYMBOL = c.SYMBOL AND
>                       julianday(e.DATE) - julianday(c.DATE) BETWEEN 15 AND
> 75 AND
>                       strftime('%H:%M:%S',c.TIME) BETWEEN
> strftime('%H:%M:%S',e.BEGTIME) AND strftime('%H:%M:%S',e.ENDTIME)")
>
> [[alternative HTML version deleted]]
>



More information about the R-help mailing list