[R] RAM usage

Vaidotas Zemlys mpiktas at delfi.lt
Mon Oct 21 12:37:57 CEST 2002


 >> I'm having problems while working with large data sets with R 1.5.1 in
 >> windows 2000. Given a integer matrix size of 30 columns and  15000 rows
 >> my function should return a boolean matrix size of about 5000 rows and
 >> 15000 columns.

 >That's  75million items of 4bytes each, hence almost 300Mb for that one

Does that mean that R reserves 4 bytes for logical object with length 1? On 
the whole how much memory R allocates for different data types? I searched 
a bit, but I didn't find anything useful on this subject. I would like to 
know if it is possible how much memory R needs for storing for example real 
matrix size of 50 rows and 100 columns together with column and row names. 
Or where to find information on such subject.

I thought that R needs only 1 byte for storing logical byte, and because of 
that I underestimated the size of memory R would need to use.

 > You have not told us your problem, so has not demonstrated that
 > `a lot of memory must be used'.   Hard to help when we don't know what 
  > you are attempting, but few problems cannot be done in pieces

I did not tell my problem, because I thought that it was more or less 
irrelevant to the memory usage problems I was experiencing. My intention 
was to ask about how R manages memory and is there something special about 
that management everyone should know, but I don't know. I'm sorry if my 
letter was a bit unclear, English is not my native language.

As for my problem, I'm trying to find out how well recursive partitioning 
could separate a "pure" subset. In recursive partitioning (and all tree 
methods) the tree is grown using the splits, that separates node into two 
subsets best. Thus given set is divided into subsets minimizing broadly 
speaking some statistic, which depends on all subsets. My goal is to 
single out one "pure" subset, I don't care about other subsets, so clearly 
I do not want to minimize some statistic which depends on all subsets. So I 
try to grow trees using not only the splits that are best, but the splits 
that are nearly best as well. To be exact I use 10 best splits for every 
node. So if I split the root node twice I get 1000 trees. I have to save 
information about terminal nodes, that is what objects do belong to it. As 
these objects are elements of a given vector y, for each terminal node I 
save the logical vector length of a given vector where TRUE in position i 
means that element y[i] is present in that terminal node.

To sum up I have a initial matrix X where dim(X)[1]==m, dim(X)[2]==n, 
vector y, length(y)==m, and I do splitting of y upon the columns of X. For 
each terminal node I save a logical vector t, length(t)==length(y)==m, 
where t[i]==TRUE for some i, means that y[i] belongs to terminal node t. 
With 1000 trees I can have maximum 4000 terminal nodes, so I need to store 
4000*m logical items. As you can understand from my previous letters, I 
encountered problems, when m is about 15000.

I'm trying to grow these trees purely for exploratory reasons, it may be 
that my mathematical and statistical assumptions can be totally wrong, so 
that's why I did not give much details about my problem earlier.

Thanks for all your answers.

Vaidotas Zemlys

PS R rulezzz!!! :)

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list