[R] Memory usage in read.csv()

Gabor Grothendieck ggrothendieck at gmail.com
Tue Jan 19 20:30:06 CET 2010


You could also try read.csv.sql in sqldf.  See examples on sqldf home page:

http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql

On Tue, Jan 19, 2010 at 9:25 AM,  <nabble.30.miller_2555 at spamgourmet.com> wrote:
> I'm sure this has gotten some attention before, but I have two CSV
> files generated from vmstat and free that are roughly 6-8 Mb (about
> 80,000 lines) each. When I try to use read.csv(), R allocates all
> available memory (about 4.9 Gb) when loading the files, which is over
> 300 times the size of the raw data.  Here are the scripts used to
> generate the CSV files as well as the R code:
>
> Scripts (run for roughly a 24-hour period):
>    vmstat -ant 1 | awk '$0 !~ /(proc|free)/ {FS=" "; OFS=","; print
> strftime("%F %T %Z"),$6,$7,$12,$13,$14,$15,$16,$17;}' >>
> ~/vmstat_20100118_133845.o;
>    free -ms 1 | awk '$0 ~ /Mem\:/ {FS=" "; OFS=","; print
> strftime("%F %T %Z"),$2,$3,$4,$5,$6,$7}' >>
> ~/memfree_20100118_140845.o;
>
> R code:
>    infile.vms <- "~/vmstat_20100118_133845.o";
>    infile.mem <- "~/memfree_20100118_140845.o";
>    vms.colnames <-
> c("time","r","b","swpd","free","inact","active","si","so","bi","bo","in","cs","us","sy","id","wa","st");
>    vms.colclass <- c("character",rep("integer",length(vms.colnames)-1));
>    mem.colnames <- c("time","total","used","free","shared","buffers","cached");
>    mem.colclass <- c("character",rep("integer",length(mem.colnames)-1));
>    vmsdf <- (read.csv(infile.vms,header=FALSE,colClasses=vms.colclass,col.names=vms.colnames));
>    memdf <- (read.csv(infile.mem,header=FALSE,colClasses=mem.colclass,col.names=mem.colnames));
>
> I am running R v2.10.0 on a 64-bit machine with Fedora 10 (Linux
> version 2.6.27.41-170.2.117.fc10.x86_64 ) with 6Gb of memory. There
> are no other significant programs running and `rm()` followed by `
> gc()` successfully frees the memory (followed by swapins after other
> programs seek to used previously cached information swapped to disk).
> I've incorporated the memory-saving suggestions in the `read.csv()`
> manual page, excluding the limit on the lines read (which shouldn't
> really be necessary here since we're only talking about < 20 Mb of raw
> data. Any suggestions, or is the read.csv() code known to have memory
> leak/ overcommit issues?
>
> Thanks
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list