[Rd] faster base::sequence

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Nov 28 11:30:02 CET 2010


On Sun, 28 Nov 2010, Romain Francois wrote:

> Le 28/11/10 10:30, Prof Brian Ripley a écrit :
>> Is sequence used enough to warrant this? As the help page says
>> 
>> Note that ‘sequence <- function(nvec) unlist(lapply(nvec,
>> seq_len))’ and it mainly exists in reverence to the very early
>> history of R.
>
> I don't know. Would it be used more if it were more efficient ?

It is for you to make a compelling case for others to do work 
(maintain changed code) for your wish.

>> I regard it as unsafe to assume that NA_INTEGER will always be negative,
>> and bear in mind that at some point not so far off R integers (or at
>> least lengths) will need to be more than 32-bit.
>
> sure. updated and dressed up as a patch.
>
> I've made it a .Call because I'm not really comfortable with .Internal, etc 
> ...

> Do you mean that I should also use something else instead of "int" and 
> "int*". Is there some future proof typedef or macro for the type associated 
> with INTSXP ?

Not yet.  I was explaining why NA_INTEGER might change.

>
>> On Sun, 28 Nov 2010, Romain Francois wrote:
>> 
>>> Hello,
>>> 
>>> Based on yesterday's R-help thread (help: program efficiency), and
>>> following Bill's suggestions, it appeared that sequence:
>>> 
>>>> sequence
>>> function (nvec)
>>> unlist(lapply(nvec, seq_len))
>>> <environment: namespace:base>
>>> 
>>> could benefit from being written in C to avoid unnecessary memory
>>> allocations.
>>> 
>>> I made this version using inline:
>>> 
>>> require( inline )
>>> sequence_c <- local( {
>>> fx <- cfunction( signature( x = "integer"), '
>>> int n = length(x) ;
>>> int* px = INTEGER(x) ;
>>> int x_i, s = 0 ;
>>> /* error checking */
>>> for( int i=0; i<n; i++){
>>> x_i = px[i] ;
>>> /* this includes the check for NA */
>>> if( x_i <= 0 ) error( "needs non negative integer" ) ;
>>> s += x_i ;
>>> }
>>> 
>>> SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
>>> int * p_res = INTEGER(res) ;
>>> for( int i=0; i<n; i++){
>>> x_i = px[i] ;
>>> for( int j=0; j<x_i; j++, p_res++)
>>> *p_res = j+1 ;
>>> }
>>> UNPROTECT(1) ;
>>> return res ;
>>> ' )
>>> function( nvec ){
>>> fx( as.integer(nvec) )
>>> }
>>> })
>>> 
>>> 
>>> And here are some timings:
>>> 
>>>> x <- 1:10000
>>>> system.time( a <- sequence(x ) )
>>> utilisateur système écoulé
>>> 0.191 0.108 0.298
>>>> system.time( b <- sequence_c(x ) )
>>> utilisateur système écoulé
>>> 0.060 0.063 0.122
>>>> identical( a, b )
>>> [1] TRUE
>>> 
>>> 
>>> 
>>>> system.time( for( i in 1:10000) sequence(1:10) )
>>> utilisateur système écoulé
>>> 0.119 0.000 0.119
>>>> 
>>>> system.time( for( i in 1:10000) sequence_c(1:10) )
>>> utilisateur système écoulé
>>> 0.019 0.000 0.019
>>> 
>>> 
>>> I would write a proper patch if someone from R-core is willing to push
>>> it.
>>> 
>>> Romain
>
> -- 
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://bit.ly/9VOd3l : ZAT! 2010
> |- http://bit.ly/c6DzuX : Impressionnism with R
> `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-devel mailing list