[R] [External] Somewhat disconcerting behavior of seq.int()

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Tue May 3 07:08:48 CEST 2022


Just confirming that it's %% in integers vs. doubles on my system:

> s1 <- seq.int(2, 1e5, by =1) ## doubles
> s2 = as.integer(s1)

##  **Note units below**

> microbenchmark( v1 <- s1 %% 2, times = 50) ## floating point
Unit: milliseconds
        expr      min       lq    mean   median       uq      max neval
 v1 <- s1%%2 69.28204 69.60496 69.8957 69.81379 70.01729 71.36125    50

> microbenchmark( v2 <- s2 %% 2L, times = 50)  ## integer
Unit: microseconds
         expr     min      lq     mean   median      uq     max neval
 v2 <- s2%%2L 166.626 167.042 172.7431 170.5215 177.667 194.334    50

I have no idea why the big difference, but I am pretty sure it's way
beyond me. Maybe Mac gurus can figure it out. I may post this on
r-sig-mac to see.

Bert

On Mon, May 2, 2022 at 9:37 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
>
> Well, I'm on an M1 Mac, so that is certainly different than either of
> your systems. I installed the precompiled binary, which may also have
> something to do with it. Whether these make a difference I have no
> clue.
>
> However, the fact remains that the Help file *does* warn that the type
> of the seq.int() value is essentially indeterminate, and when I
> explicitly cast it to integer, all is well. So mea culpa.
>
> I will fool around some tomorrow with more careful profiling to see if
> I can learn anything, but the best I say at present is: it is what it
> is. Unless, of course, someone provides an answer before then.
>
> Bert Gunter
>
>
> On Mon, May 2, 2022 at 8:53 PM <luke-tierney using uiowa.edu> wrote:
> >
> > Something is very different about your system. On my Linux system I get
> >
> > > microbenchmark(l1 <- sieve1(1e5), times =50)
> > Unit: milliseconds
> >                  expr     min       lq     mean   median       uq     max neval
> >   l1 <- sieve1(1e+05) 5.04615 5.350576 6.967507 5.787626 7.323502 28.3085    50
> > > microbenchmark(l2 <- sieve2(1e5), times =50)
> > Unit: milliseconds
> >                  expr      min       lq     mean   median      uq      max neval
> >   l2 <- sieve2(1e+05) 14.58763 15.79368 17.00738 16.29299 17.0723 30.57338    50
> >
> > Similar on an Intel Mac.
> >
> > Best,
> >
> > luke
> >
> > On Tue, 3 May 2022, Bert Gunter wrote:
> >
> > > ** Disconcerting to me, anyway; perhaps not to others**
> > > (Apologies if this has been discussed before. I was a bit nonplussed by
> > > it, but maybe I'm just clueless.) Anyway:
> > >
> > > Here are two almost identical versions of the Sieve of Eratosthenes.
> > > The difference between them is only in the call to seq.int() that is
> > > highlighted
> > >
> > > sieve1 <- function(m){
> > >   if(m < 2) return(NULL)
> > >   a <- floor(sqrt(m))
> > >   pr <- Recall(a)
> > > ####################
> > >   s <- seq.int(2, to = m) ## Only difference here
> > > ######################
> > >   for( i in pr) s <- s[as.logical(s %% i)]
> > >   c(pr,s)
> > > }
> > >
> > > sieve2 <- function(m){
> > >   if(m < 2) return(NULL)
> > >   a <- floor(sqrt(m))
> > >   pr <- Recall(a)
> > > ####################
> > >   s <- seq.int(2, to = m, by =1) ## Only difference here
> > > #######################
> > >   for( i in pr) s <- s[as.logical(s %% i)]
> > >   c(pr,s)
> > > }
> > >
> > > However, execution time is *quite* different.
> > >
> > > library(microbenchmark)
> > >
> > >> microbenchmark(l1 <- sieve1(1e5), times =50)
> > > Unit: milliseconds
> > >                expr      min       lq     mean  median       uq      max
> > > l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751
> > > neval
> > >    50
> > >
> > >> microbenchmark(l2 <- sieve2(1e5), times =50)
> > > Unit: milliseconds
> > >                expr      min      lq     mean   median       uq      max
> > > l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464
> > > neval
> > >    50
> > >
> > > Now note that:
> > >> identical(l1, l2)
> > > [1] FALSE
> > >
> > > ## Because:
> > >> str(l1)
> > > int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
> > >
> > >> str(l2)
> > > num [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
> > >
> > > I therefore assume that seq.int(), an internal generic, is dispatching
> > > to a method that uses integer arithmetic for sieve1 and floating point
> > > for sieve2. Is this correct? If not, what do I fail to understand? And
> > > is this indeed the source of the large difference in execution time?
> > >
> > > Further, ?seq.int says:
> > > "The interpretation of the unnamed arguments of seq and seq.int is not
> > > standard, and it is recommended always to name the arguments when
> > > programming."
> > >
> > > The above suggests that maybe this advice should be qualified, and/or
> > > adding some comments to the Help file regarding this behavior might be
> > > useful to naïfs like me.
> > >
> > > In case it makes a difference (and it might!):
> > >
> > >> sessionInfo()
> > > R version 4.2.0 (2022-04-22)
> > > Platform: x86_64-apple-darwin17.0 (64-bit)
> > > Running under: macOS Monterey 12.3.1
> > >
> > > Matrix products: default
> > > LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
> > >
> > > locale:
> > > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> > >
> > > attached base packages:
> > > [1] stats     graphics  grDevices utils     datasets  methods   base
> > >
> > > other attached packages:
> > > [1] microbenchmark_1.4.9
> > >
> > > loaded via a namespace (and not attached):
> > > [1] compiler_4.2.0 tools_4.2.0
> > >
> > >
> > > Thanks for any enlightenment and again apologies if I am plowing old ground.
> > >
> > > Best to all,
> > >
> > > Bert Gunter
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > --
> > Luke Tierney
> > Ralph E. Wareham Professor of Mathematical Sciences
> > University of Iowa                  Phone:             319-335-3386
> > Department of Statistics and        Fax:               319-335-3017
> >     Actuarial Science
> > 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> > Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-help mailing list