[R] fast or space-efficient lookup?

Patrick Burns pburns at pburns.seanet.com
Sun Oct 9 19:42:23 CEST 2011


I think you are looking for the 'data.table'
package.

On 09/10/2011 17:31, ivo welch wrote:
> Dear R experts---I am struggling with memory and speed issues.  Advice
> would be appreciated.
>
> I have a long data set (of financial stock returns, with stock name
> and trading day).  All three variables, stock return, id and day, are
> irregular.  About 1.3GB in object.size (200MB on disk).  now, I need
> to merge the main data set with some aggregate data (e.g., the S&P500
> market rate of return, with a day index) from the same day.  this
> "market data set" is not a big data set (object.size=300K, 5 columns,
> 12000 rows).
>
> let's say my (dumb statistical) plan is to run one grand regression,
> where the individual rate of return is y and the market rate of return
> is x.  the following should work without a problem:
>
> combined<- merge( main, aggregate.data, by="day", all.x=TRUE, all.y=FALSE )
> lm( stockreturn ~ marketreturn, data=combined )
>
> alas, the merge is neither space-efficient nor fast.  in fact, I run
> out of memory on my 16GB linux machine.  my guess is that by whittling
> it down, I could work it (perhaps doing it in chunks, and then
> rbinding it), but this is painful.
>
> in perl, I would define a hash with the day as key and the market
> return as value, and then loop over the main data set to supplement
> it.
>
> is there a recommended way of doing such tasks in R, either super-fast
> (so that I merge many many times) or space efficient (so that I merge
> once and store the results)?
>
> sincerely,
>
> /iaw
>
> ----
> Ivo Welch (ivo.welch at gmail.com)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Patrick Burns
pburns at pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')



More information about the R-help mailing list