[Rd] Very hard to reproduce bug (?) in R-devel

Wed Apr 5 21:24:14 CEST 2017

Winston,
I had a similar experience to you tracking down an insanely difficult bug in my R code that "disappeared" whenever slight changes were made to the script (e.g. like adding cat() statements). In my case, it coincided with my over-eager compilation of R and its library stack, as I was also experimenting with a cutting edge version of the gcc compiler as well as what I thought were innocuous performance enhancing CFLAGS like -O3/-Ofast -march=native. After downgrading gcc and recompiling everything (R and BLAS) without the extra flags, the problem went away. Not saying this is your problem, just sharing my similar experience.

<TANGENT> And for anyone interested, I did extensive benchmarking on the effects of the added CFLAGS and cutting edge gcc compilers, and I never saw any significant performance enhancement, and frequently saw a big performance penalty with extra flags, particularly as matrix algebra problems sometimes slowed down enormously when using a custom BLAS (ATLAS) compiled with anything fancy. Though nowadays, the out-of-the-box R BLAS seems much faster than it used to be, so I don't even bother fiddling with a custom BLAS. </TANGENT>

--Robert

-----Original Message-----
From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Winston Chang
Sent: Wednesday, April 05, 2017 2:41 PM
To: Martin Maechler <maechler at stat.math.ethz.ch>
Cc: R Development <R-devel at r-project.org>
Subject: Re: [Rd] Very hard to reproduce bug (?) in R-devel

)

On Wed, Apr 5, 2017 at 2:59 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
>
> >>>>> Winston Chang <winstonchang1 at gmail.com>
> >>>>>     on Tue, 4 Apr 2017 15:29:40 -0500 writes:
>
>     > I've done some more investigation into the problem, and it is very
>     > difficult to pin down. What it looks like is happening is roughly like this:
>     > - `p` is an environment and `p$e` is also an environment.
>     > - There is a loop. In each iteration, it looks for one item in `p$e`, saves
>     > it in a variable `x`, then removes that item from `p$e`. Then it invokes
>     > `x()`. The loop runs again, until there are no more items in `p$e`.
>
>     > The problem is that `ls(p$e)` sometimes returns the wrong values -- it
>     > returns the values that it had in previous iterations of the loop. The
>     > behavior is very touchy. Almost any change to the code will slightly change
>     > the behavior; sometimes the `ls()` returns values from a different
>     > iteration of the loop, and sometimes the problem doesn't happen at all.
>
>     > I've put a  Dockerfile and instructions for reproducing the problem here:
>     > https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88
>
>     > I think that I've gotten about as far with this as I can, though I'd be
>     > happy to provide more information if anyone wants to take look at the
>     > problem.
>
> Dear Winston,
>
> While I agree this may very well be a bug in R(-devel), and hence
> also R in 3.4.0 alpha and hence quite important to be dealt with,
>
> your code still involves 3 non-trivial  packages (DBI, R6,
> testthat) some of which have their own C code and notably load
> a couple of other package's namespaces.
> We've always made a point
>       https://www.r-project.org/bugs.html
> that bugs in R should be reproducible without extra
> packages... and I think it would definitely help to pinpoint the
> issue to be seen outside of your extra packages' world.
>
> Or have you been aware of that and are just asking for help
> finding a bug in one of the extra packages involved, a bug that might only be triggered by recent changes in R ?
>
> OTOH, what you describe above  (p ; p$e ; p$e$x ...)
> should be reproducible in pure "base" R code, right?
>
> I'm sorry not to be of more help
> Martin

Of the four packages that are loaded when running the tests (pool,
DBI, R6, testthat, magrittr, crayon), only testthat contains compiled
code, and it is pretty minimal. The only compiled code in testthat
that should be executed is a function that finds a label -- but that
happens only after an error occurs.

This is the sessionInfo():
R Under development (unstable) (2017-03-23 r72389)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] pool_0.1.0     DBI_0.6-1      testthat_1.0.2

loaded via a namespace (and not attached):
[1] compiler_3.5.0 magrittr_1.5   R6_2.2.0       crayon_1.3.2

I have spent days trying to make a minimal example, and that is the
best that I have been able to do so far. I would not involve all this
complexity if I could avoid it. The problem is that the behavior is
extremely sensitive to any changes. Seemingly-innocuous things like
removing other tests, or adding a cat() statement can make the error
disappear, or in some cases it makes ls() return values from a
different (previous) iteration of the loop.

In my testing, I have also seen a case where calls to `cat( ... ,
file=stderr())` result in output that has the wrong order. I don't
know if that's due to the cat() calls being executed in the wrong
order, or if it's simply being printed or buffered in the wrong order.

This is the code in question that cat()s to stderr:
https://github.com/rstudio/pool/blob/0724ad9/R/scheduler.R#L74-L90
      while (TRUE) {
        tasks <- sort(ls(private$scheduledTasks))
        if (length(tasks) == 0) break
        t <- tasks[[1]]

        s <- stderr()
        cat(tasks, "--1--\n", file = s)
        cat(ls(private$scheduledTasks), "--2--\n", file = s)
        cat(t, "--3--\n", file = s)

        task <- private$scheduledTasks[[t]]
        rm(list = t, envir = private$scheduledTasks)

        task()
      }

Without going into too much detail, it should print lines of text that
end with --1--, --2--, --3--, and repeat. Here's what it prints
instead when running the tests:

20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --1--
20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --1--
20170405-182549.466875-18 20170405-182559.456628-17
20170405-182729.456318-16 --2--
20170405-182549.466875-18 --3--
20170405-182559.456628-17 20170405-182729.456318-16 --1--
20170405-182559.456628-17 20170405-182729.456318-16 --2--
20170405-182559.456628-17 --3--
20170405-182729.456318-16 --1--
20170405-182729.456318-16 --2--
20170405-182729.456318-16 --3--
 --2--
20170405-182549.466875-18 --3--
1. Error: pool scheduler: schedules things in the right order
(@test-scheduling.R#13)
could not find function "task"
1: naiveScheduler$protect({
       scheduleTask(1e+05, function() {
           results <<- c(results, 3L)
       })
       scheduleTask(10000, function() {
           results <<- c(results, 2L)
       })
       scheduleTask(10, function() {
           results <<- c(results, 1L)
       })
   }) at testthat/test-scheduling.R:13
2: private$refCount$release() at testthat/test-scheduling.R:13
3: private$callback()

It's almost as though, in the middle of the first iteration of the
while loop, R jumps to the next iteration of the loop, runs the loop a
couple of times to completion, and then returns to the first iteration
of the loop at the place that it left.

This can be reproduced by following the instructions in this gist:
  https://gist.github.com/wch/2596a1c9f1bcdee91bb210c782141c88

Almost any change to the code will make the error disappear, or change
to a different one.

With regard to reproducing it in "base" R: I made a simple example
using just R (no other packages) that does something similar to what
happens in the tests, but even when I run it for 100,000 iterations,
the error doesn't occur.

I think there's a good chance that this is due to a bug in R. I have
been trying to track down the cause of the problem but haven't been
able find it.

-Winston

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel