[R] R tools for large files

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Aug 25 12:00:45 CEST 2003


On Mon, 25 Aug 2003, Murray Jorgensen wrote:

> At 08:12 25/08/2003 +0100, Prof Brian Ripley wrote:
> >I think that is only a medium-sized file.
> 
> "Large" for my purposes means "more than I really want to read into memory"
> which in turn means "takes more than 30s". I'm at home now and the file
> isn't so I'm not sure if the file is large or not.
> 
> More responses interspesed below. BTW, I forgot to mention that I'm using
> Windows and so do not have nice unix tools readily available.

But you do, thanks to me, as you need them to installed R packages.

> >On Mon, 25 Aug 2003, Murray Jorgensen wrote:
> >
> >> I'm wondering if anyone has written some functions or code for handling 
> >> very large files in R. I am working with a data file that is 41 
> >> variables times who knows how many observations making up 27MB altogether.
> >> 
> >> The sort of thing that I am thinking of having R do is
> >> 
> >> - count the number of lines in a file
> >
> >You can do that without reading the file into memory: use
> >system(paste("wc -l", filename)) 
> 
> Don't think that I can do that in Windows XL.

I presume you mean Windows XP?  Of course you can, and wc.exe is in 
Rtools.zip!

> or read in blocks of lines via a 
> >connection
> 
> But that does sound promising!
> 
> >
> >> - form a data frame by selecting all cases whose line numbers are in a 
> >> supplied vector (which could be used to extract random subfiles of 
> >> particular sizes)
> >
> >R should handle that easily in today's memory sizes.  Buy some more RAM if 
> >you don't already have 1/2Gb.  As others have said, for a real large file,
> >use a RDBMS to do the selection for you.
> 
> It's just that R is so good in reading in initial segments of a file that I
> can't believe that it can't be effective in reading more general
> (pre-specified) subsets.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list