[R] Somewhat disconcerting behavior of seq.int()

Andrew Simmons @kw@|mmo @end|ng |rom gm@||@com
Tue May 3 04:00:36 CEST 2022


A sequence where 'from' and 'to' are both integer valued (not necessarily
class integer) will use R_compact_intrange; the return value is an integer
vector and is stored with minimal space.

In your case, you specified a 'from', 'to', and 'by'; if all are integer
class, then the return value is also integer class. I think if 'from' and
'to' are integer valued and 'by' is integer class, the return value is
integer class, might want to check that though. In your case, I think
replacing 'by = 1' with 'by = 1L' will mean the sequences are identical,
though it may still take longer than not specifying at all.

On Mon, May 2, 2022, 21:46 Bert Gunter <bgunter.4567 using gmail.com> wrote:

> ** Disconcerting to me, anyway; perhaps not to others**
> (Apologies if this has been discussed before. I was a bit nonplussed by
> it, but maybe I'm just clueless.) Anyway:
>
> Here are two almost identical versions of the Sieve of Eratosthenes.
> The difference between them is only in the call to seq.int() that is
> highlighted
>
> sieve1 <- function(m){
>    if(m < 2) return(NULL)
>    a <- floor(sqrt(m))
>    pr <- Recall(a)
> ####################
>    s <- seq.int(2, to = m) ## Only difference here
> ######################
>    for( i in pr) s <- s[as.logical(s %% i)]
>    c(pr,s)
> }
>
> sieve2 <- function(m){
>    if(m < 2) return(NULL)
>    a <- floor(sqrt(m))
>    pr <- Recall(a)
> ####################
>    s <- seq.int(2, to = m, by =1) ## Only difference here
> #######################
>    for( i in pr) s <- s[as.logical(s %% i)]
>    c(pr,s)
> }
>
> However, execution time is *quite* different.
>
> library(microbenchmark)
>
> > microbenchmark(l1 <- sieve1(1e5), times =50)
> Unit: milliseconds
>                 expr      min       lq     mean  median       uq      max
>  l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751
>  neval
>     50
>
> > microbenchmark(l2 <- sieve2(1e5), times =50)
> Unit: milliseconds
>                 expr      min      lq     mean   median       uq      max
>  l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464
>  neval
>     50
>
> Now note that:
> > identical(l1, l2)
> [1] FALSE
>
> ## Because:
> > str(l1)
>  int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>
> > str(l2)
>  num [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>
> I therefore assume that seq.int(), an internal generic, is dispatching
> to a method that uses integer arithmetic for sieve1 and floating point
> for sieve2. Is this correct? If not, what do I fail to understand? And
> is this indeed the source of the large difference in execution time?
>
> Further, ?seq.int says:
> "The interpretation of the unnamed arguments of seq and seq.int is not
> standard, and it is recommended always to name the arguments when
> programming."
>
> The above suggests that maybe this advice should be qualified, and/or
> adding some comments to the Help file regarding this behavior might be
> useful to naïfs like me.
>
> In case it makes a difference (and it might!):
>
> > sessionInfo()
> R version 4.2.0 (2022-04-22)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Monterey 12.3.1
>
> Matrix products: default
> LAPACK:
> /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] microbenchmark_1.4.9
>
> loaded via a namespace (and not attached):
> [1] compiler_4.2.0 tools_4.2.0
>
>
> Thanks for any enlightenment and again apologies if I am plowing old
> ground.
>
> Best to all,
>
> Bert Gunter
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list