[R] Optimized File Reading with R

Lorenzo Isella lorenzo.isella at gmail.com
Tue May 15 22:26:09 CEST 2007


An apology: it takes roughly a couple of minutes on my laptop, running 
Debian. I had been running some other simulations for quite some time 
and, though it looks odd for Linux, the subsequent work I did with R was 
slowed down.
Many thanks

Lorenzo

Peter Dalgaard wrote:
> Prof Brian Ripley wrote:
>> On Tue, 15 May 2007, Lorenzo Isella wrote:
>>
>>  
>>> Dear All,
>>> Hope I am not bumping into a FAQ, but so far my online search has 
>>> been fruitless
>>> I need to read some data file using R. I am using the (I think)
>>> standard command:
>>>
>>> data_150<-read.table("y_complete06000", header=FALSE)
>>>
>>> where y_complete06000 is a 6000 by 40 table of numbers.
>>> I am puzzled at the fact that R is taking several minutes to read 
>>> this file.
>>> First I thought it may have been due to its shape, but even
>>> re-expressing and saving the matrix as a 1D array does not help.
>>> It is not a small file, but not even huge (it amounts to about 5Mb of
>>> text file).
>>> Is there anything I can do to speed up the file reading?
>>>     
>>
>> You could try reading the help page or the 'R Data Import/Export' 
>> manual.
>> Both point out things like
>>
>>       'read.table' is not the right tool for reading large matrices,
>>       especially those with many columns: it is designed to read _data
>>       frames_ which may have columns of very different classes. Use
>>       'scan' instead.
>>
>> On the other hand I am surprised at several minutes, but as you 
>> haven't even told us your OS, it is hard to know what to expect.  My 
>> Linux box took 3 secs for a 6000x40 matrix with read.table, 0.8 sec 
>> with scan.
>>
>>   
> If it is 40 rows and 6000 columns, then it might explain it:
>
> > x <- as.data.frame(matrix(rnorm(40*6000),6000))
> > write.table(x,file="xx.txt")
> > system.time(y <- read.table("xx.txt"))
> user system elapsed
> 1.229 0.007 1.250
> > write.table(t(x),file="xx.txt")
> > system.time(y <- read.table("xx.txt"))
> user system elapsed
> 92.986 0.188 93.912
>
>
> However, this is still not _several_ minutes, and it is on my laptop 
> which is not particularly fast.
>



More information about the R-help mailing list