[Rd] suggestion how to use memcpy in duplicate.c

Wed Apr 21 21:35:15 CEST 2010

On Apr 21, 2010, at 2:15 PM, Seth Falcon wrote:

> On 4/21/10 10:45 AM, Simon Urbanek wrote:
>> Won't that miss the last incomplete chunk? (and please don't use
>> DATAPTR on INTSXP even though the effect is currently the same)
>> 
>> In general it seems that the it depends on nt whether this is
>> efficient or not since calls to short memcpy are expensive (very
>> small nt that is).
>> 
>> I ran some empirical tests to compare memcpy vs for() (x86_64, OS X)
>> and the results were encouraging - depending on the size of the
>> copied block the difference could be quite big: tiny block (ca. n =
>> 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x
>> faster as the size increases the gap closes (presumably due to RAM
>> bandwidth limitations) so for n = 512M it is ~30%.
>> 
> 
>> Of course this is contingent on the implementation of memcpy,
>> compiler, architecture etc. And will only matter if copying is what
>> you do most of the time ...
> 
> Copying of vectors is something that I would expect to happen fairly often in many applications of R.
> 
> Is for() faster on small blocks by enough that one would want to branch based on size?
> 

Good question. Given that the branching itself adds overhead possibly not. In the best case for() can be ~40% faster (for single-digit n) but that means billions of copies to make a difference (since the operation itself is so fast). The break-even point on my test machine is n=32 and when I added the branching it took 20% hit so I guess it's simply not worth it. The only case that may be worth branching is n:1 since that is likely a fairly common use (the branching penalty in copy routines is lower than comparing memcpy/for implementations since the branching can be done before the outer for loop so this may vary case-by-case).

Cheers,
Simon