[R] some problems with ram usage and warnings

David Winsemius dwinsemius at comcast.net
Sat Dec 12 03:52:30 CET 2009


On Dec 11, 2009, at 11:08 AM, Tom Knockinger wrote:

> Hi,
> i am new to the R-project but until now i have found solutions for  
> every problem in toturials, R Wikis and this mailing list, but now i  
> have some problems which I can't solve with this knowledge.
>
> I have some data like this:
>
> # sample data
> head1 = "a;b;c;d;e;f;g;h;i;k;l;m;n;o"
> data1 = "1;1;1;1;1;1;1;1;1;1;1;1;1;1"
> data2 = "2;2;2;2;2;2;2;2;2;2;2;2;2;2"
> data3 = "3;3;3;3;3;3;3;3;3;3;3;3;3;3"
> datastring = paste("", head1,data1,data2,data3,"",sep="\n")
>
> # import operation
> res = read.table(textConnection(datastring), header=TRUE, sep =  
> c(";"))
> closeAllConnections()
>
> # I use these two lines in a for-loop like this:
> #for( j in 1:length(data)) {
> #	res[j] = read.table(textConnection(datastring[j]),
> header=TRUE, sep = c(";"))
> #	closeAllConnections()
> #}
>
> I get these strings from a file which contains about 50 to 1000 of  
> them, so I can read them all into a list. I am not sure if there is  
> a better way to do this, but it works for me. Maybe you have some  
> suggestions for a better solution.
>
> Now after this short introduction to the r-program I use, I have two  
> problems with this approach.
>
> 1) warnings
> i get warnings like "unused connection 3 (datastring) closed" after  
> some other operations from time to time. But all connections should  
> already be closed, and I doesn't create new ones.
>
> 2) ram usage and program shutdowns
> length(data) is usually between 50 to 1000. So it takes some space  
> in ram (approx 100-200 mb) which is no problem but I use some  
> analysis code which results in about 500-700 mb ram usage, also not  
> a real problem.
> The results are matrixes of (50x14 to 1000x14) so they are small  
> enough to work with them afterwards: create plots, or make some more  
> analysis.
> So i wrote a function which do the analysis one file after another  
> and keep only the results in a list. But after some about 2-4 files  
> my R process uses about 1500MB and then the troubles begin.

Windows?

> The R console terminates or prints the error that no more space can  
> be allocated. So i have to do each file separate and save each  
> result in a file and restart R after 2 processed files. And do that  
> 3-5 times so that all files are processed, which is a bit anoying.
>
> I did some research on this problem and i find out that
> -) after I import the data in the same variable the ram usage goes  
> up each time about 100-200mb instead of reusing or purging the old  
> data, which should be overwritten since they are no longer available  
> after i import a new file.
> -) the same occures with the analysis functions which uses much more  
> space and also doesn't release the old no longer used variables. But  
> ls() doesn't shows them at all.
> -) also after I cleared all variables with "rm(list=ls(all=TRUE))"  
> the used ram space is still the same.
>
> So is there a possibility to get the ram space back? So i can do all  
> the analysis in one session and don't have to mess around with  
> additional files?

It is possible to call the garbage collector with gc(). Supposedly  
that should not be necessary, since garbage collection is automatic,  
but I have the impression that it helps prevent situations that  
otherwise lead to virtual memory getting invoked on the Mac (which I  
also thought should not be happening, but I will swear that it does.)

-- 
David
>
>
> Thanks for your help
>
> Tom
> -- 
> Preisknaller: GMX DSL Flatrate für nur 16,99 Euro/mtl.!
> http://portal.gmx.net/de/go/dsl02
>
> -- 
> Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla  
> Firefox 3.5 -
> sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list