[Rd] suggestion how to use memcpy in duplicate.c

Hervé Pagès hpages at fhcrc.org
Sat Apr 24 02:30:09 CEST 2010


Hi Matthew,

Matthew Dowle wrote:
> Just to add some clarification, the suggestion wasn't motivated by speeding 
> up a length 3 vector being recycled 3.3 million times.  But its a good point 
> that any change should not make that case slower.  I don't know how much 
> vectorCopy is called really,  DUPLICATE_ATOMIC_VECTOR seems more 
> significant, which doesn't recycle, and already had the FIXME next to it.
> 
> Where copyVector is passed a large source though, then memcpy should be 
> faster than any of the methods using a for loop through each element 
> (whether recycling or not),  allowing for the usual caveats. What are the 
> timings like if you repeat the for loop 100 times to get a more robust 
> timing ?  It needs to be a repeat around the for loop only, not the 
> allocVector whose variance looks to be included in those timings below. Then 
> increase the size of the source vector,  and compare to memcpy.

On my system (DELL LATITUDE laptop with 64-bit 9.04 Ubuntu):

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void *memcpy2(char *dest, const char *src, size_t n)
{
         int i;

         for (i = 0; i < n; i++) *(dest++) = *(src++);
         return dest;
}

int main()
{
         int n, kmax, k;
         char *x, *y;

         n = 25000000;
	kmax = 100;
         x = (char *) malloc(n);
         y = (char *) malloc(n);
         for (k = 0; k < kmax; k++)
                 //memcpy2(y, x, n);
                 memcpy(y, x, n);
         return 0;
}

Benchmarks:

n = 25000000, kmax = 100, memcpy2:

   real	0m8.123s
   user	0m8.077s
   sys	0m0.040s

n = 25000000, k = 100, memcpy:

   real	0m1.076s
   user	0m1.004s
   sys	0m0.060s

n = 25000, kmax = 100000, memcpy2:

   real	0m8.033s
   user	0m8.005s
   sys	0m0.012s

n = 25000, kmax = 100000, memcpy:

   real	0m0.353s
   user	0m0.352s
   sys	0m0.000s

n = 25, kmax = 100000000, memcpy2:

   real	0m8.351s
   user	0m8.313s
   sys	0m0.008s

n = 25, kmax = 100000000, memcpy:

   real	0m0.628s
   user	0m0.624s
   sys	0m0.004s

So depending on the size of the memory area to copy, GNU memcpy() is
between 7.5x and 22x faster than using a for() loop. You can reasonably
expect that the authors of memcpy() have done their best to optimize
the code for most platforms they support, for big and small memory
areas, and that if there was a need to branch based on the size of the
area, that's already done *inside* memcpy() (I'm just speculating here,
I didn't look at memcpy's source code).

Cheers,
H.

> 
> Matthew
> 
> "William Dunlap" <wdunlap at tibco.com> wrote in message 
> news:77EB52C6DD32BA4D87471DCD70C8D70002CE6BFA at NA-PA-VBE03.na.tibco.com...
> If I were worried about the time this loop takes,
> I would avoid using i%nt.  For the attached C code
> compile with gcc 4.3.3 with -O2 I get
>   > # INTEGER() in loop
>   > system.time( r1 <- .Call("my_rep1", 1:3, 1e7) )
>      user  system elapsed
>     0.060   0.012   0.071
> 
>   > # INTEGER() before loop
>   > system.time( r2 <- .Call("my_rep2", 1:3, 1e7) )
>      user  system elapsed
>     0.076   0.008   0.086
> 
>   > # replace i%src_length in loop with j=0 before loop and
>   > #    if(++j==src_length) j=0 ;
>   > # in the loop.
>   > system.time( r3 <- .Call("my_rep3", 1:3, 1e7) )
>      user  system elapsed
>     0.024   0.028   0.050
>   > identical(r1,r2) && identical(r2,r3)
>   [1] TRUE
> 
> The C code is:
> #define USE_RINTERNALS /* pretend we are in the R kernel */
> #include <R.h>
> #include <Rinternals.h>
> 
> 
> SEXP my_rep1(SEXP s_src, SEXP s_dest_length)
> {
>     int src_length = length(s_src) ;
>     int dest_length = asInteger(s_dest_length) ;
>     int i,j ;
>     SEXP s_dest ;
>     PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
>     if(TYPEOF(s_src) != INTSXP) error("src must be integer data") ;
>     for(i=0;i<dest_length;i++) {
>         INTEGER(s_dest)[i] = INTEGER(s_src)[i % src_length] ;
>     }
>     UNPROTECT(1) ;
>     return s_dest ;
> }
> SEXP my_rep2(SEXP s_src, SEXP s_dest_length)
> {
>     int src_length = length(s_src) ;
>     int dest_length = asInteger(s_dest_length) ;
>     int *psrc = INTEGER(s_src) ;
>     int *pdest ;
>     int i ;
>     SEXP s_dest ;
>     PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
>     pdest = INTEGER(s_dest) ;
>     if(TYPEOF(s_src) != INTSXP) error("src must be integer data") ;
>     /* end of boilerplate */
>     for(i=0;i<dest_length;i++) {
>         pdest[i] = psrc[i % src_length] ;
>     }
>     UNPROTECT(1) ;
>     return s_dest ;
> }
> SEXP my_rep3(SEXP s_src, SEXP s_dest_length)
> {
>     int src_length = length(s_src) ;
>     int dest_length = asInteger(s_dest_length) ;
>     int *psrc = INTEGER(s_src) ;
>     int *pdest ;
>     int i,j ;
>     SEXP s_dest ;
>     PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
>     pdest = INTEGER(s_dest) ;
>     if(TYPEOF(s_src) != INTSXP) error("src must be integer data") ;
>     /* end of boilerplate */
>     for(j=0,i=0;i<dest_length;i++) {
>         *pdest++ = psrc[j++] ;
>         if (j==src_length) {
>             j = 0 ;
>         }
>     }
>     UNPROTECT(1) ;
>     return s_dest ;
> }
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
> 
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org
>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Romain Francois
>> Sent: Wednesday, April 21, 2010 12:32 PM
>> To: Matthew Dowle
>> Cc: r-devel at stat.math.ethz.ch
>> Subject: Re: [Rd] suggestion how to use memcpy in duplicate.c
>>
>> Le 21/04/10 17:54, Matthew Dowle a écrit :
>>>> From copyVector in duplicate.c :
>>> void copyVector(SEXP s, SEXP t)
>>> {
>>>      int i, ns, nt;
>>>      nt = LENGTH(t);
>>>      ns = LENGTH(s);
>>>      switch (TYPEOF(s)) {
>>> ...
>>>      case INTSXP:
>>>      for (i = 0; i<  ns; i++)
>>>          INTEGER(s)[i] = INTEGER(t)[i % nt];
>>>      break;
>>> ...
>>>
>>> could that be replaced with :
>>>
>>>      case INTSXP:
>>>      for (i=0; i<ns/nt; i++)
>>>          memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char
>> *)DATAPTR(t),
>>> nt*sizeof(int));
>>>      break;
>> or at least with something like this:
>>
>> int* p_s = INTEGER(s) ;
>> int* p_t = INTEGER(t) ;
>> for( i=0 ; i < ns ; i++){
>> p_s[i] = p_t[i % nt];
>> }
>>
>> since expanding the INTEGER macro over and over has a price.
>>
>>> and similar for the other types in copyVector.  This won't
>> help regular
>>> vector copies, since those seem to be done by the
>> DUPLICATE_ATOMIC_VECTOR
>>> macro, see next suggestion below, but it should help
>> copyMatrix which calls
>>> copyVector, scan.c which calls copyVector on three lines,
>> dcf.c (once) and
>>> dounzip.c (once).
>>>
>>> For the DUPLICATE_ATOMIC_VECTOR macro there is already a
>> comment next to it
>>> :
>>>
>>>      <FIXME>: surely memcpy would be faster here?
>>>
>>> which seems to refer to the for loop  :
>>>
>>>      else { \
>>>      int __i__; \
>>>      type *__fp__ = fun(from), *__tp__ = fun(to); \
>>>      for (__i__ = 0; __i__<  __n__; __i__++) \
>>>        __tp__[__i__] = __fp__[__i__]; \
>>>    } \
>>>
>>> Could that loop be replaced by the following ?
>>>
>>>     else { \
>>>     memcpy((char *)DATAPTR(to), (char *)DATAPTR(from),
>> __n__*sizeof(type)); \
>>>     }\
>>>
>>> In the data.table package, dogroups.c uses this technique,
>> so the principle
>>> is tested and works well so far.
>>>
>>> Are there any road blocks preventing this change, or is
>> anyone already
>>> working on it ?  If not then I'll try and test it (on
>> Ubuntu 32bit) and
>>> submit patch with timings, as before.  Comments/pointers
>> much appreciated.
>>> Matthew
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>
>> -- 
>> Romain Francois
>> Professional R Enthusiast
>> +33(0) 6 28 91 30 30
>> http://romainfrancois.blog.free.fr
>> |- http://bit.ly/9aKDM9 : embed images in Rd documents
>> |- http://tr.im/OIXN : raster images and RImageJ
>> |- http://tr.im/OcQe : Rcpp 0.7.7
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list