[Rd] allocVector bug ?

Luke Tierney luke at stat.uiowa.edu
Fri Nov 3 05:26:22 CET 2006


On Wed, 1 Nov 2006, Vladimir Dergachev wrote:

>
> Hi all,
>
>  I was looking at the following piece of code in src/main/memory.c, function
> allocVector :
>
>    if (size <= NodeClassSize[1]) {
> 	node_class = 1;
> 	alloc_size = NodeClassSize[1];
>    }
>    else {
> 	node_class = LARGE_NODE_CLASS;
> 	alloc_size = size;
> 	for (i = 2; i < NUM_SMALL_NODE_CLASSES; i++) {
> 	    if (size <= NodeClassSize[i]) {
> 		node_class = i;
> 		alloc_size = NodeClassSize[i];
> 		break;
> 	    }
> 	}
>    }
>
>
> It appears that for LARGE_NODE_CLASS the variable alloc_size should not be
> size, but something far less as we are not using vector heap, but rather
> calling malloc directly in the code below (and from discussions I read on
> this mailing list I think that these two are different - please let me know
> if I am wrong).
>
> So when allocate a large vector the garbage collector goes nuts trying to find
> all that space which is not going to be needed after all.

This is as intended, not a bug. The garbage collector does not "go
nuts" -- it is doing a garbage collection that may release memory in
advance of making a large allocation.  The size of the current
allocation request is used as part of the process of deciding when to
satisfy an allocation by malloc (of a single large noda or a page) and
when to first do a gc.  It is essential to do this for large
allocations as well to keep the memory footprint down and help reduce
fragmentation.

The strategy for deciding when to allocate and when to gc is by
necessity heuristic.  It tries to keep overall memory footprint low
but at the same time tries to adapt to usage so that gc happens less
oftn once a pattern of using larger amounts of memory emerges. The
current strategy seems quite robust across a large range of
architactures, memory configurations, and applications.

That said, when I wrote the mamager I kept in mind that we might
eventually want to try morre sophisticated schemes and/or allow some
user control over the schemes used.  It may be time to revisit this
soon.

luke


>
> I made an experiment and replaced the line alloc_size=size with alloc_size=0.
>
> R compiled fine (both 2.4.0 and 2.3.1) and passed make check with no issues
> (it all printed OK).
>
> Furthermore, all allocVector calls completed in no time and my test case run
> very fast (22 seconds, as opposed to minutes).
>
> In addition, attach() was instantaneous which was wonderful.
>
> Could anyone with deeper knowledge of R internals comment on whether this
> makes any sense ?
>
>                           thank you very much !
>
>                                        Vladimir Dergachev
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu




More information about the R-devel mailing list