[R] size limitations in R

Daniel Lakeland dlakelan at street-artists.org
Fri Aug 31 17:20:23 CEST 2007


On Fri, Aug 31, 2007 at 01:31:12PM +0100, Fabiano Vergari wrote:

> I am a SAS user currently evaluating R as a possible addition or
> even replacement for SAS. The difficulty I have come across straight
> away is R's apparent difficulty in handling relatively large data
> files. Whilst I would not expect it to handle datasets with millions
> of records, I still really need to be able to work with dataset with
> 100,000+ records and 100+ variables. Yet, when reading a .csv file
> with 180,000 records and about 200 variables, the software virtually
> ground to a halt (I stopped it after 1 hour). Are there guidelines
> or maybe a limitations document anywhere that helps me assess the
> size

180k records with 200 variables = 36 million entries, if they're
numeric then they're doubles taking up 8 bytes, so 288 MB of RAM. This
should be perfectly fine for R, as long as you have that much free
RAM.

However, the routines that read CSV and tabular delimited files are
relatively inefficient for such large files.

In order to handle large data files, it is better to use one of the
database interfaces. My preference would be sqlite unless I already
had the data on a mysql or other database server.

the documentation for the packages RSQLite and SQLiteDF should be
helpful, as well as the documentation for SQLite itself, which has a
facility for efficiently importing CSV and similar files directly to a
SQLite database.

eg: http://netadmintools.com/art572.html



-- 
Daniel Lakeland
dlakelan at street-artists.org
http://www.street-artists.org/~dlakelan



More information about the R-help mailing list