[Rd] stats::fft produces inconsistent results

Wed Oct 20 11:26:21 CEST 2021

>>>>> GILLIBERT, Andre 
>>>>>     on Wed, 20 Oct 2021 08:10:00 +0000 writes:

    > Hello,
    > That sounds like a good diagnosis!
    > Indeed, R vectors are passed "by reference" to C code, but the semantic must be "by value", i.e. the C function must NOT change the contents of the vector, except in very specific cases.

    > A good program that has to work on a vector, must first duplicate the vector, unless the only reference to the vector is the reference inside the C function.
    > This can be tested by the MAYBE_REFERENCED() macro in Rinternals.h.

    > A good example can be found in the fft() function in src/library/stats/src/fourier.c in R source code:
    > switch (TYPEOF(z)) {
    > case INTSXP:
    > case LGLSXP:
    > case REALSXP:
    > z = coerceVector(z, CPLXSXP);
    > break;
    > case CPLXSXP:
    > if (MAYBE_REFERENCED(z)) z = duplicate(z);
    > break;
    > default:
    > error(_("non-numeric argument"));
    > }
    > PROTECT(z);

    > This code coerces non-complex vectors to complex. Since this makes a copy, there is no need to duplicate.
    > Complex vectors are duplicated, unless they are not referenced by anything but the fft() function.

    > Now, the z vector can be modified "in place" without inconsistency.

    > Properly using R vectors in C code is tricky. You have to understand.
    > 1) When you are allowed or not to modify vectors
    > 2) When to PROTECT()vectors
    > 3) How the garbage collector works and when it can trigger (answer : basically, when you call any internal R function)

    > Chapter 5 of "Writing R Extensions" documentation is quite extensive:
    > https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Handling-R-objects-in-C

    > -- 
    > Sincerely
    > André GILLIBERT

Thank you, André , that's very good.

Just to state the obvious conclusion:

If Ben's suggestion is correct (and André has explained *how*
that could happen) this would mean  a
SEVERE BUG in package ravetools's  mvfftw() function.

and it would have been (yet another) case of gaining speed by
killing correctness...

... but then ravetools  is not even a CRAN package, so why
should you dare to use it for anything serious ?

... yes, being grouchy ..

    > -----Message d'origine-----
    > De : R-devel <r-devel-bounces using r-project.org> De la part de Ben Bolker
    > Envoyé : mercredi 20 octobre 2021 03:27
    > À : r-devel using r-project.org
    > Objet : Re: [Rd] stats::fft produces inconsistent results

    > This is a long shot, but here's a plausible scenario:

    > as part of its pipeline, ravetools::mvfftw computes the mean of the
    > input vector **and then centers it to a mean of zero** (intentionally or
    > accidentally?)

    > because variables are passed to compiled code by reference (someone
    > can feel free to correct my terminology), this means that the original
    > vector in R now has a mean of zero

    > the first element of fft() is mean(x)*length(x), so if mean(x) has
    > been forced to zero, that would explain your issue.

    > I don't know about the non-reproducibility part.

    > On 10/19/21 7:06 PM, Dipterix Wang wrote:
    >> Dear R-devel Team,
    >> 
    >> I'm developing a neuroscience signal pipeline package in R (https://github.com/dipterix/ravetools) and I noticed a weird issue that failed my unit test.
    >> 
    >> Basically I was trying to use `fftw3` library to implement fast multivariate fft function in C++. When I tried to compare my results with stats::fft, the test result showed the first element of **expected** (which was produced by stats::fft) was zero, which, I am pretty sure, is wrong, and I can confirm that my function produces correct results.
    >> 
    >> However, somehow I couldn’t reproduce this issue on my personal computer (osx, M1, R4.1.1), the error simply went away.
    >> 
    >> The catch is my function produced consistent and correct results but stats::fft was not. This does not mean `stats::fft` has bugs. Instead, I suspect there could be some weird interactions between my code and stats::fft at C/C++ level, but I couldn’t figure it out why.
    >> 
    >> +++ Details:
    >> 
    >> Here’s the code I used for the test:
    >> 
    >> https://github.com/dipterix/ravetools/blob/4dc35d64763304aff869d92dddad38a7f2b30637/tests/testthat/test-fftw.R#L33-L41
    >> 
    >> ————————Test code————————
    >> set.seed(1)
    >> x <- rnorm(1000)
    >> dim(x) <- c(100,10)
    >> a <- ravetools:::mvfftw_r2c(x, 0)
    >> c <- apply(x, 2, stats::fft)[1:51,]
    >> expect_equal(a, c)
    >> ————————————————————————
    >> 
    >> Here are the tests that gave me the errors:
    >> 
    >> The test logs on win-builder
    >> https://win-builder.r-project.org/07586ios8AbL/00check.log
    >> 
    >> Test logs on GitHub
    >> https://github.com/dipterix/ravetools/runs/3944874310?check_suite_focus=true
    >> 
    >> 
    >> —————————————— Failed tests ——————————————
    >> -- Failure (test-fftw.R:41:3): mvfftw_r2c --------------------------------------
    >> `a` (`actual`) not equal to `c` (`expected`).
    >> 
    >> actual vs expected
    >> [,1]                    [,2]                  [,3]                  [,4]                    ...
    >> - actual[1, ]     10.8887367+ 0.0000000i  -3.7808077+ 0.0000000i   2.967354+ 0.000000i   5.160186+ 0.000000i ...
    >> + expected[1, ]    0.0000000+ 0.0000000i  -3.7808077+ 0.0000000i   2.967354+ 0.000000i   5.160186+ 0.000000i...
    >> 
    >> ————————————————————————
    >> 
    >> The first columns are different, `actual` is the results I produced via `ravetools:::mvfftw_r2c`, and `expected` was produced by `stats::fft`
    >> 
    >> 
    >> Any help or attention is very much appreciated.
    >> Thanks,
    >> - Zhengjia