[R] RAM usage
Vaidotas Zemlys
mpiktas at delfi.lt
Mon Oct 21 12:37:57 CEST 2002
Hi,
>> I'm having problems while working with large data sets with R 1.5.1 in
>> windows 2000. Given a integer matrix size of 30 columns and 15000 rows
>> my function should return a boolean matrix size of about 5000 rows and
>> 15000 columns.
>That's 75million items of 4bytes each, hence almost 300Mb for that one
>object.
Does that mean that R reserves 4 bytes for logical object with length 1? On
the whole how much memory R allocates for different data types? I searched
a bit, but I didn't find anything useful on this subject. I would like to
know if it is possible how much memory R needs for storing for example real
matrix size of 50 rows and 100 columns together with column and row names.
Or where to find information on such subject.
I thought that R needs only 1 byte for storing logical byte, and because of
that I underestimated the size of memory R would need to use.
> You have not told us your problem, so has not demonstrated that
> `a lot of memory must be used'. Hard to help when we don't know what
> you are attempting, but few problems cannot be done in pieces
I did not tell my problem, because I thought that it was more or less
irrelevant to the memory usage problems I was experiencing. My intention
was to ask about how R manages memory and is there something special about
that management everyone should know, but I don't know. I'm sorry if my
letter was a bit unclear, English is not my native language.
As for my problem, I'm trying to find out how well recursive partitioning
could separate a "pure" subset. In recursive partitioning (and all tree
methods) the tree is grown using the splits, that separates node into two
subsets best. Thus given set is divided into subsets minimizing broadly
speaking some statistic, which depends on all subsets. My goal is to
single out one "pure" subset, I don't care about other subsets, so clearly
I do not want to minimize some statistic which depends on all subsets. So I
try to grow trees using not only the splits that are best, but the splits
that are nearly best as well. To be exact I use 10 best splits for every
node. So if I split the root node twice I get 1000 trees. I have to save
information about terminal nodes, that is what objects do belong to it. As
these objects are elements of a given vector y, for each terminal node I
save the logical vector length of a given vector where TRUE in position i
means that element y[i] is present in that terminal node.
To sum up I have a initial matrix X where dim(X)[1]==m, dim(X)[2]==n,
vector y, length(y)==m, and I do splitting of y upon the columns of X. For
each terminal node I save a logical vector t, length(t)==length(y)==m,
where t[i]==TRUE for some i, means that y[i] belongs to terminal node t.
With 1000 trees I can have maximum 4000 terminal nodes, so I need to store
4000*m logical items. As you can understand from my previous letters, I
encountered problems, when m is about 15000.
I'm trying to grow these trees purely for exploratory reasons, it may be
that my mathematical and statistical assumptions can be totally wrong, so
that's why I did not give much details about my problem earlier.
Thanks for all your answers.
Vaidotas Zemlys
PS R rulezzz!!! :)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list