[Rd] read.csv reads more rows than indicated by wc -l

Matthew Dowle mdowle at mdowle.plus.com
Fri Dec 21 00:46:39 CET 2012


> Somewhere on my wish/TO DO list is for someone to rewrite read.table 
> for
> better robustness *and* efficiency ...

Wish granted. New in data.table 1.8.7 :

New function fread(), a fast and friendly file reader.
*  header, skip, nrows, sep and colClasses are all auto detected.
*  integers>2^31 are detected and read natively as bit64::integer64.
*  accepts filenames, URLs and "A,B\n1,2\n3,4" directly
*  new implementation entirely in C
*  with a 50MB .csv, 1 million rows x 6 columns :
      read.csv("test.csv")                                        # 
30-60 sec
      read.table("test.csv",<all known tricks and known nrows>)   #    
10 sec
      fread("test.csv")                                           #     
3 sec
* airline data: 658MB csv (7 million rows x 29 columns)
      read.table("2008.csv",<all known tricks and known nrows>)   #   
360 sec
      fread("2008.csv")                                           #    
50 sec
See ?fread. Many thanks to Chris Neff and Garrett See for ideas, 
and beta testing.

The help page ?fread is fairly well developed :

Comments, feedback and bug reports very welcome.



More information about the R-devel mailing list