[Rd] faster base::sequence

Romain Francois romain at r-enthusiasts.com
Sun Nov 28 11:44:56 CET 2010


Le 28/11/10 11:30, Prof Brian Ripley a écrit :
> On Sun, 28 Nov 2010, Romain Francois wrote:
>
>> Le 28/11/10 10:30, Prof Brian Ripley a écrit :
>>> Is sequence used enough to warrant this? As the help page says
>>>
>>> Note that ‘sequence <- function(nvec) unlist(lapply(nvec,
>>> seq_len))’ and it mainly exists in reverence to the very early
>>> history of R.
>>
>> I don't know. Would it be used more if it were more efficient ?
>
> It is for you to make a compelling case for others to do work (maintain
> changed code) for your wish.

No trouble. The patch is there, if anyone finds it interesting or 
compelling, they will speak up I suppose.

Otherwise it is fine for me if it ends up in no man's land. I have the 
code, if I want to use it, I can squeeze it in a package.

>>> I regard it as unsafe to assume that NA_INTEGER will always be negative,
>>> and bear in mind that at some point not so far off R integers (or at
>>> least lengths) will need to be more than 32-bit.
>>
>> sure. updated and dressed up as a patch.
>>
>> I've made it a .Call because I'm not really comfortable with
>> .Internal, etc ...
>
>> Do you mean that I should also use something else instead of "int" and
>> "int*". Is there some future proof typedef or macro for the type
>> associated with INTSXP ?
>
> Not yet. I was explaining why NA_INTEGER might change.

sure. thanks for the reminder.

>>> On Sun, 28 Nov 2010, Romain Francois wrote:
>>>
>>>> Hello,
>>>>
>>>> Based on yesterday's R-help thread (help: program efficiency), and
>>>> following Bill's suggestions, it appeared that sequence:
>>>>
>>>>> sequence
>>>> function (nvec)
>>>> unlist(lapply(nvec, seq_len))
>>>> <environment: namespace:base>
>>>>
>>>> could benefit from being written in C to avoid unnecessary memory
>>>> allocations.
>>>>
>>>> I made this version using inline:
>>>>
>>>> require( inline )
>>>> sequence_c <- local( {
>>>> fx <- cfunction( signature( x = "integer"), '
>>>> int n = length(x) ;
>>>> int* px = INTEGER(x) ;
>>>> int x_i, s = 0 ;
>>>> /* error checking */
>>>> for( int i=0; i<n; i++){
>>>> x_i = px[i] ;
>>>> /* this includes the check for NA */
>>>> if( x_i <= 0 ) error( "needs non negative integer" ) ;
>>>> s += x_i ;
>>>> }
>>>>
>>>> SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
>>>> int * p_res = INTEGER(res) ;
>>>> for( int i=0; i<n; i++){
>>>> x_i = px[i] ;
>>>> for( int j=0; j<x_i; j++, p_res++)
>>>> *p_res = j+1 ;
>>>> }
>>>> UNPROTECT(1) ;
>>>> return res ;
>>>> ' )
>>>> function( nvec ){
>>>> fx( as.integer(nvec) )
>>>> }
>>>> })
>>>>
>>>>
>>>> And here are some timings:
>>>>
>>>>> x <- 1:10000
>>>>> system.time( a <- sequence(x ) )
>>>> utilisateur système écoulé
>>>> 0.191 0.108 0.298
>>>>> system.time( b <- sequence_c(x ) )
>>>> utilisateur système écoulé
>>>> 0.060 0.063 0.122
>>>>> identical( a, b )
>>>> [1] TRUE
>>>>
>>>>
>>>>
>>>>> system.time( for( i in 1:10000) sequence(1:10) )
>>>> utilisateur système écoulé
>>>> 0.119 0.000 0.119
>>>>>
>>>>> system.time( for( i in 1:10000) sequence_c(1:10) )
>>>> utilisateur système écoulé
>>>> 0.019 0.000 0.019
>>>>
>>>>
>>>> I would write a proper patch if someone from R-core is willing to push
>>>> it.
>>>>
>>>> Romain
>>
>> --
>> Romain Francois
>> Professional R Enthusiast
>> +33(0) 6 28 91 30 30
>> http://romainfrancois.blog.free.fr
>> |- http://bit.ly/9VOd3l : ZAT! 2010
>> |- http://bit.ly/c6DzuX : Impressionnism with R
>> `- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube
>>
>>
>


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9VOd3l : ZAT! 2010
|- http://bit.ly/c6DzuX : Impressionnism with R
`- http://bit.ly/czHPM7 : Rcpp Google tech talk on youtube



More information about the R-devel mailing list