[R] naive question

Douglas Bates bates at stat.wisc.edu
Wed Jun 30 02:56:09 CEST 2004


Igor Rivin wrote:

> I was not particularly annoyed, just disappointed, since R seems like
> a much better thing than SAS in general, and doing everything with a combination
> of hand-rolled tools is too much work. However, I do need to work with very large data sets, and if it takes 20 minutes to read them in, I have to explore other
> options (one of which might be S-PLUS, which claims scalability as a major 
> , er, PLUS over R).


If you are routinely working with very large data sets it would be 
worthwhile learning to use a relational database (PostgreSQL, MySQL, 
even Access) to store the data and then access it from R with RODBC or 
one of the specialized database packages.

R is slow reading ASCII files because it is assembling the meta-data on 
the fly and it is continually checking the types of the variables being 
read.  If you know all this information and build it into your table 
definitions, reading the data will be much faster.

A disadvantage of this approach is the need to learn yet another 
language and system.  I was going to do an example but found I could not 
because I left all my SQL books at home (I'm travelling at the moment) 
and I couldn't remember the particular commands for loading a table from 
an ASCII file.




More information about the R-help mailing list