[R] Reading large files quickly
freenx.10.robsteele at xoxy.net
Mon May 11 03:39:31 CEST 2009
At the moment I'm just reading the large file to see how fast it goes.
Eventually, if I can get the read time down, I'll write out a processed
version. Thanks for suggesting scan(); I'll try it.
jim holtman wrote:
> Since you are reading it in chunks, I assume that you are writing out each
> segment as you read it in. How are you writing it out to save it? Is the
> time you are quoting both the reading and the writing? If so, can you break
> down the differences in what these operations are taking?
> How do you plan to use the data? Is it all numeric? Are you keeping it in
> a dataframe? Have you considered using 'scan' to read in the data and to
> specify what the columns are? If you would like some more help, the answer
> to these questions will help.
> On Sat, May 9, 2009 at 10:09 PM, Rob Steele <freenx.10.robsteele at xoxy.net>wrote:
>> Thanks guys, good suggestions. To clarify, I'm running on a fast
>> multi-core server with 16 GB RAM under 64 bit CentOS 5 and R 2.8.1.
>> Paging shouldn't be an issue since I'm reading in chunks and not trying
>> to store the whole file in memory at once. Thanks again.
>> Rob Steele wrote:
>>> I'm finding that readLines() and read.fwf() take nearly two hours to
>>> work through a 3.5 GB file, even when reading in large (100 MB) chunks.
>>> The unix command wc by contrast processes the same file in three
>>> minutes. Is there a faster way to read files in R?
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help