[Rd] Speeding up library loading

Duncan Murdoch murdoch at stats.uwo.ca
Tue Apr 26 12:23:04 CEST 2005


Ali - wrote:
> 
> 
>>>
>>> Assume 100 C++ classes each class having 100 member functions. After 
>>> wrapping these classes into R, if the wrapping design is 
>>> class-oriented we should have like 100 objects. At the same time, if 
>>> the wrapping design is function-oriented we have like 10`000 objects 
>>> which are too lazy for lazy loading.
>>>
>>> I have tried wrapping exactly the same classes by R.oo based on S3 
>>> and the outcome package was much faster in both installation and 
>>> loading. The package went slow once I tried it with S4. I guess R.oo 
>>> makes the package more class-oriented while S4 object-orientation is 
>>> really function-oriented causing all this friction in installation 
>>> and loading.
>>>
>>> Is there any way to ask R to lazy-load each object as a 'bundle of S4 
>>> methods with the same class'?
>>
>>
>> I don't think so.  There are ways to load a bundle of objects all at 
>> once (put them in an environment, attach the environment), but S4 
>> methods aren't self-contained, they need to be registered with the 
>> system.   You could probably write a function to load them and 
>> register them all at once, but I don't think it exists now.
>>
>> Duncan Murdoch
> 
> 
> (1) What is the difference between loading and registering objects in R?

Loading just creates the object.  Registering it is what setMethod() and 
such calls do.  They allow the system to know that it should call that 
function in response to a call to the generic with a certain signature, 
and so on.
> 
> (2) You are talking about 'loading and registering at once'. Isn't this 
> 'at once' the cause of slow loading?

I haven't done any profiling, but I would guess the registering is the 
slow part.

> (3) Doesn't having many environments mean lose of efficiency again?

Yes, I'd guess that looking things up in a chain of 100 environments is 
slower than looking them up in one gigantic environment.  Again, I 
haven't done any profiling, but I'd guess it would come close to being 
100 times worse, i.e. in practice order N time instead of order 1 time 
(but I'm sure these aren't the theoretical limits).

But you were asking about delayed loading, so I was assuming that in 
most cases you would only load a small subset of those 100 environments. 
  I haven't tried any big problems like yours, but I would be willing to 
guess that registering is slower than O(N), so cutting down on the 
number of things you register will give a big improvement on loading speed.

But you do have to remember the two pieces of advice you've been given 
in this thread:

   - nobody else has written a package with ten thousand methods, so 
you're likely to find things out that nobody else knows about.

   - The S4 object model is quite different from that of C++, so it 
probably doesn't make sense to have a direct correspondence between C++ 
classes and methods and R classes and methods.  There are probably much 
more efficient ways to get access to the functionality of your C++ library.

Duncan Murdoch



More information about the R-devel mailing list