[R] R died on large data set

Henrik Bengtsson hb at stat.berkeley.edu
Sat Feb 20 14:04:20 CET 2010


Some suggestions:

The line:

 pearson.dist <- as.dist(1-cor(t(todos.norm), method="pearson"))

includes several data manipulations in "one go".  Each manipulation
creates at least one extra copy of your data in memory.  When you do
it this, you make it harder for the R garbage collector to clean out
such memory.

The following should use less memory:

todos.norm <- t(todos.norm);
gc();  # Explicit garbage collect; cleans out the 1st 'todos.norm' object.
rho <- cor(todos.norm, method="pearson");
rm(todos.norm);  # Not needed anymore
gc();  # Explicit garbage collect; cleans out the 2nd 'todos.norm' object.
rho <- 1-rho;
gc();  # Explicit garbage collect; cleans out the 1st 'rho' object.
pearson.dist <- as.dist(rho);

Not sure if it helps in your case/with your data, but this is how you
are a user can help R at bit on the way.

You should of course also clean out all other stray objects you don't
use anymore, before doing the above.

My $.02

/Henrik

On Sat, Feb 20, 2010 at 1:13 PM, Marcelo Laia <marcelolaia at gmail.com> wrote:
> Hi, I am trying to run a script on R and it died before finish.
>
> I already read the list archives, and memory help pages
> (http://tinyurl.com/yaxco6w), but I am unable to solve the issue.
>
> My Debian shows:
>
> marcelo at laia:~$ ulimit
> unlimited
> marcelo at laia:~$
>
> On system monitor (gnome) I see that R reaches 1.9 Gb, before die.
>
> The R code is:
>
>> ls() ## only todos.norm object are listed
> [1] "todos.norm"
>> dim(todos.norm)
> [1] 9600   15
>>
>> library("cluster")
>> pearson.dist <- as.dist(1-cor(t(todos.norm), method="pearson"))
> Died
>
> What I could do to solve my problem?
>
>> sessionInfo() ## after restart R
> R version 2.10.1 (2009-12-14)
> i486-pc-linux-gnu
>
> locale:
>  [1] LC_CTYPE=pt_BR.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=pt_BR.UTF-8        LC_COLLATE=pt_BR.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=pt_BR.UTF-8
>  [7] LC_PAPER=pt_BR.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>
> My system:
>
> Linux laia 2.6.32-trunk-686 #1 SMP Sun Jan 10 06:32:16 UTC 2010 i686 GNU/Linux
>
> Than you very much!
>
> --
> Marcelo Luiz de Laia
> Brazil
> Linux user number 487797
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list