[R] R tools for large files

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Aug 25 09:12:31 CEST 2003


I think that is only a medium-sized file.

On Mon, 25 Aug 2003, Murray Jorgensen wrote:

> I'm wondering if anyone has written some functions or code for handling 
> very large files in R. I am working with a data file that is 41 
> variables times who knows how many observations making up 27MB altogether.
> 
> The sort of thing that I am thinking of having R do is
> 
> - count the number of lines in a file

You can do that without reading the file into memory: use
system(paste("wc -l", filename)) or read in blocks of lines via a 
connection

> - form a data frame by selecting all cases whose line numbers are in a 
> supplied vector (which could be used to extract random subfiles of 
> particular sizes)

R should handle that easily in today's memory sizes.  Buy some more RAM if 
you don't already have 1/2Gb.  As others have said, for a real large file,
use a RDBMS to do the selection for you.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list