[R] Merging big data sets

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Mon Sep 9 17:48:59 CEST 2013


Please don't post in HTML. (Read the Posting Guide.)

Consider using the sqldf package.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Renger van Nieuwkoop <renger at vannieuwkoop.ch> wrote:
>Hi
>I have 6 rather big data sets (between 400000 and 800000 lines) on
>transport data (times, distances and travelers between nodes). They all
>have a common index (start-end nodes).
>I want to aggregate this data, but for that I have to merge them.
>I tried to use "merge" with the result that R (3.0.1) crashes (Windows
>8 machine, 16 Gb Ram).
>Then I tried the join from the data.table package. Here I got the
>message that 2^34 is too big (no idea why it is 2^34 as it is a left
>join).
>Then I decided to do a loop using the tables and assigning them, which
>takes a very, very long time (still running at the moment).
>
>Here is the code:
>for (i in 1:length(dataP$Start)){
>    c<-dataP$Start[i]
>    d<-dataP$End[i]
>    dataP[J(c,d)]$OEV.T<-ttoevP[J(c,d)]$OEV.T
>}
>
>dataP has 800'000 lines and ttoevP has about 500'000 lines.
>
>Any hints to speed up this process are welcome.
>
>Renger
>_________________________________________
>Centre of Economic Research (CER-ETH)
>Z�richbergstrasse 18 (ZUE)
>CH - 8032 Z�rich
>+41 44 632 02 63
>mailto: rengerv at etzh.ch<mailto:rengerv at etzh.ch>
>blog.modelworks.ch
>
>
>	[[alternative HTML version deleted]]
>
>
>
>------------------------------------------------------------------------
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list