[Rd] Pre-allocating serialization memory buffers

Mon Oct 10 16:00:59 CEST 2022

Hi all,

While investigating the performance of different hashing algorithms of the
"digest" package, I found that serialization to memory buffers via
serialize(obj, connection=NULL) was suspiciously slow for large objects.

After looking into the R source I found that the memory buffer grows
approx. n -> 2(n+1) but are not pre-allocated in any way. I then created a
minimal demo package (https://github.com/nx10/serialize_prealloc) with
different modified versions of the serialization mechanisms that let me
trace memory allocations and pre-allocate the buffer using
object.size(obj).

Benchmarking this shows that there is no apparent performance decrease for
small or deeply nested objects and approx. logarithmic gains with bigger
objects (more than 3 times faster on my machine with ~1GB large objects).
Benchmarks are included in the README of the demo package.

I have not done any tests with other kinds of streams such as file
connections, as I am not sure we can make assumptions about the
implementation of streams that are created elsewhere.

I would be happy to provide a patch for review if this is something you
consider worth investigating, but I would need some pointers on how you
would want this to be implemented, as object.size lives in the utils
library and serialize is in src\main\. (E.g. copy a non-error version of
objectsize to main\)

Best,
Florian

	[[alternative HTML version deleted]]