[Rd] Memory allocation in read.table

Simon Urbanek simon.urbanek at r-project.org
Wed Aug 28 20:52:45 CEST 2013


On Aug 28, 2013, at 2:24 PM, Hadley Wickham wrote:

>> Yup - parsing is the most expensive part. That's why for high-throughput data you don't want to use ASCII representation. It's amazing that the disk speeds are now so high that CPUs are the bottlenecks now, not vice versa.
> 
> Do you have any recommendations for binary formats? For R, is there anything obviously better than Rdata?
> 

native formats are the fastest (and versatile), so
readBin/writeBin or mmap
I tend to avoid strings (I use dates as POSIXct which are doubles and for anything else factors - which are integers) so the above works for me just fine.
I am working on a way to do direct mmap serialization of SEXPs but it's not ready yet (basic vectors are supported but complex objects not yet).

Cheers,
Simon



More information about the R-devel mailing list