[R] naive question

Duncan Murdoch dmurdoch at pair.com
Wed Jun 30 04:24:06 CEST 2004


On Tue, 29 Jun 2004 16:59:58 -0700, "Vadim Ogranovich"
<vograno at evafunds.com> wrote:

> R's IO is indeed 20 - 50 times slower than that of equivalent C code no
>matter what you do, which has been a pain for some of us. 

Things like this shouldn't be a pain for long.  If C code works well,
why not use C?  It wouldn't be hard to write two C functions that 
1. counted the lines and 2. read them into preallocated vectors. 

Doing it this way you could use .C, you don't need to learn the
intricacies of .Call, and it should be about half the speed (since it
takes two passes) of fast C code, i.e. 10-25 times faster than the
read.* functions.

Then, if you felt really ambitious, you could write it in a way that
others could use, put it in a package, and suddenly R would have I/O
10-25 times faster than it does now.  You wouldn't try to make it as
flexible as current R code, but for reading these huge files people
are talking about, it would be worthwhile to go through a few extra
setup steps.  

Duncan Murdoch




More information about the R-help mailing list