[R] R usage for log analysis

Allen S. Rout asr at ufl.edu
Mon Jun 12 18:06:36 CEST 2006


"Gabriel Diaz" <gabidiaz at gmail.com> writes:

> I'm taking an overview to the project documentation, and seems the
> database is the way to go to handle log files of GB order (normally
> between 2 and 4 GB each 15 day dump).

> In this document http://cran.r-project.org/doc/manuals/R-data.html,
> says R will load all data into memory to process it when using
> read.table and such. Using a database will do the same? Well,
> currently i have no machine with > 2 GB of memory.

Remember, swap too.  This means you're using more time, not running
into a hard limit.

If you're concerned about gross size, then preprocessing could be
useful; but consider: RAM is cheap.  Calibrate RAM purchases
w.r.t. hours of your coding time, -before- you start the project.
Then you can at least mutter to yourself when you waste more than the
cost of core trying to make the problem small. :)

It's entirely reasonable to do all your development work on a smaller
set, and then dump the real data into it and go home.  Unless you've
got something O(N^2) or so, you should be fine.


- Allen S. Rout



More information about the R-help mailing list