[R] R tools for large files

Murray Jorgensen maj at stats.waikato.ac.nz
Mon Aug 25 11:16:23 CEST 2003


At 08:12 25/08/2003 +0100, Prof Brian Ripley wrote:
>I think that is only a medium-sized file.

"Large" for my purposes means "more than I really want to read into memory"
which in turn means "takes more than 30s". I'm at home now and the file
isn't so I'm not sure if the file is large or not.

More responses interspesed below. BTW, I forgot to mention that I'm using
Windows and so do not have nice unix tools readily available.

>On Mon, 25 Aug 2003, Murray Jorgensen wrote:
>
>> I'm wondering if anyone has written some functions or code for handling 
>> very large files in R. I am working with a data file that is 41 
>> variables times who knows how many observations making up 27MB altogether.
>> 
>> The sort of thing that I am thinking of having R do is
>> 
>> - count the number of lines in a file
>
>You can do that without reading the file into memory: use
>system(paste("wc -l", filename)) 

Don't think that I can do that in Windows XL.

or read in blocks of lines via a 
>connection

But that does sound promising!

>
>> - form a data frame by selecting all cases whose line numbers are in a 
>> supplied vector (which could be used to extract random subfiles of 
>> particular sizes)
>
>R should handle that easily in today's memory sizes.  Buy some more RAM if 
>you don't already have 1/2Gb.  As others have said, for a real large file,
>use a RDBMS to do the selection for you.

It's just that R is so good in reading in initial segments of a file that I
can't believe that it can't be effective in reading more general
(pre-specified) subsets.

Murray

>
>-- 
>Brian D. Ripley,                  ripley at stats.ox.ac.uk
>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>University of Oxford,             Tel:  +44 1865 272861 (self)
>1 South Parks Road,                     +44 1865 272866 (PA)
>Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    +64 7 849 6486 home    Mobile 021 1395 862




More information about the R-help mailing list