[Rd] Severe memory problem using split()

cstrato cstrato at aon.at
Tue Jul 13 00:00:14 CEST 2010


Dear Martin,

Thank you, you are right, now I get:

 > ann <- read.delim("Hu6800_ann.txt", stringsAsFactors=FALSE)
 > object.size(ann)
2035952 bytes
 > u2p  <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
 > object.size(u2p)
1207368 bytes
 > object.size(unlist(u2p))
865176 bytes

Nevertheless, a size of 1.2MB for a list representing 2 of 11 columns of 
a table of size 754KB seems still to be pretty large?

Best regards
Christian


On 7/12/10 11:44 PM, Martin Morgan wrote:
> On 07/12/2010 01:45 PM, cstrato wrote:
>> Dear all,
>>
>> With great interest I followed the discussion:
>> https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html
>> since I have currently a similar problem:
>>
>> In a new R session (using xterm) I am importing a simple table
>> "Hu6800_ann.txt" which has a size of 754KB only:
>>
>>> ann<- read.delim("Hu6800_ann.txt")
>>> dim(ann)
>> [1] 7129   11
>>
>>
>> When I call "object.size(ann)" the estimated memory used to store "ann"
>> is already 2MB:
>>
>>> object.size(ann)
>> 2034784 bytes
>>
>>
>> Now I call "split()" and check the estimated memory used which turns out
>> to be 3.3GB:
>>
>>> u2p<- split(ann[,"ProbesetID"],ann[,"UNIT_ID"])
>>> object.size(u2p)
>> 3323768120 bytes
>
> I guess things improve with stringsAsFactors=FALSE in read.delim?
>
> Martin
>
>>
>> During the R session I am running "top" in another xterm and can see
>> that the memory usage of R increases to about 550MB RSIZE.
>>
>>
>> Now I do:
>>
>>> object.size(unlist(u2p))
>> 894056 bytes
>>
>> It takes about 3 minutes to complete this call and the memory usage of R
>> increases to about 1.3GB RSIZE. Furthermore, during evaluation of this
>> function the free RAM of my Mac decreases to less than 8MB free PhysMem,
>> until it needs to swap memory. When finished, free PhysMem is 734MB but
>> the size of R increased to 577MB RSIZE.
>>
>> Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not
>> change the object.size, only processing was faster and it did use less
>> memory on my Mac.
>>
>> Do you have any idea what the reason for this behavior is?
>> Why is the size of list "u2p" so large?
>> Do I make any mistake?
>>
>>
>> Here is my sessionInfo on a MacBook Pro with 2GB RAM:
>>
>>> sessionInfo()
>> R version 2.11.1 (2010-05-31)
>> x86_64-apple-darwin9.8.0
>>
>> locale:
>> [1] C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> Best regards
>> Christian
>> _._._._._._._._._._._._._._._._._._
>> C.h.r.i.s.t.i.a.n   S.t.r.a.t.o.w.a
>> V.i.e.n.n.a           A.u.s.t.r.i.a
>> e.m.a.i.l:        cstrato at aon.at
>> _._._._._._._._._._._._._._._._._._
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list