[R] Optimized File Reading with R

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue May 15 19:41:12 CEST 2007


On Tue, 15 May 2007, Lorenzo Isella wrote:

> Dear All,
> Hope I am not bumping into a FAQ, but so far my online search has been fruitless
> I need to read some data file using R. I am using the (I think)
> standard command:
>
> data_150<-read.table("y_complete06000", header=FALSE)
>
> where y_complete06000 is a 6000 by 40 table of numbers.
> I am puzzled at the fact that R is taking several minutes to read this file.
> First I thought it may have been due to its shape, but even
> re-expressing and saving the matrix as a 1D array does not help.
> It is not a small file, but not even huge (it amounts to about 5Mb of
> text file).
> Is there anything I can do to speed up the file reading?

You could try reading the help page or the 'R Data Import/Export' manual.
Both point out things like

      'read.table' is not the right tool for reading large matrices,
      especially those with many columns: it is designed to read _data
      frames_ which may have columns of very different classes. Use
      'scan' instead.

On the other hand I am surprised at several minutes, but as you haven't 
even told us your OS, it is hard to know what to expect.  My Linux box 
took 3 secs for a 6000x40 matrix with read.table, 0.8 sec with scan.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list