[Rd] Matrix issues when building R with znver3 architecture under GCC 11

Tue Apr 26 06:03:50 CEST 2022

Dear Tomas,

Thanks once again for your insight. I'll take all this on board.
I'll have a poke around to see what's up with Matrix, but I really don't
have time to dig deep.
However, I'm curious. Assuming I have the necessary resources, how do we
check against all CRAN contributed packages - as the dev team does? Is
there any advice, documentation, or scripts about how one goes about doing
that?

For now, I'm running some lengthy scripts at the moment that require a
large number of packages with many dependencies. With this, I hope to both
check the speed differences between the different builds and any
differences in their outputs.

best regards,
Kieran

On Wed, Apr 13, 2022 at 8:26 PM Tomas Kalibera <tomas.kalibera using gmail.com>
wrote:

>
> On 4/13/22 11:20, Kieran Short wrote:
>
> Hi Tomas,
>
> Many thanks for your thorough response, it is very much appreciated and
> what you say makes perfect sense to me.
>
> I was relying on the in-built R compilation checks, I have been working on
> the assumption that everything on the R side is correct (including the
> matrix package).
>
> Indeed, R 4.1.3 builds and "make check-all" passes with the more
> general -march=x86-64 architecture compiled with -O3 optimizations (in my
> hands, on the Zen3 system). So I had no underlying reason not to believe R
> or its packages were the problem when -march=znver3 was trialed. I found it
> interesting that it was only the one factorizing.R script in the Matrix
> suite that failed (out of the seemingly hundreds of remaining checks
> overall which passed). So I was more wondering if there might have been
> prior knowledge within the brain's trust on this list that "oh the
> factorizing.R matrix test does ABC error when R or the package is compiled
> with GCC using XYZ flags". As you'll read ahead, you can say that now. :)
>
> Right, but something must be broken. You might get specific comments from
> the Matrix package maintainer, but it would help at least minimizing that
> failing example to some commands you can run in R console, and showing the
> differences in outputs.
>
>
> I don't think I have the capability to determine the root trigger in R
> itself, the package, or the compiler (whichever one, or combination,  it
> actually is). However, assuming R isn't the issue, I have done is go
> through the GCC optimizations and I have now isolated the culprit
> optimization which crashes factorizing.R.
>
> It is "-fexpensive-optimizations".
>
> If I use "-fno-expensive-optimizations" paired with -O2 or -O3
> optimizations, all "make check-all" checks pass. So I can build a fully
> checked and passed R 4.1.3 under my environment now with:
>
> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2
> CXXFLAGS="-O3 -march=znver3 -fno-expensive-optimizations -flto" CFLAGS="-O3
> -march=znver3 -fno-expensive-optimizations -flto" FFLAGS="-O3 -march=znver3
> -fno-expensive-optimizations -flto" --enable-memory-profiling
> --enable-R-shlib
>
> Ok. The default optimization options used by R on selected current and
> future versions of GCC and clang also get tested via checking all of CRAN
> contributed packages. This testing sometimes finds errors not detected by
> "make check-all", including bugs in GCC. You would need a lot of resources
> to run these checks, though. In my experience it is not so rare that a bug
> (in R or GCC) only affects a very small number of packages, often even only
> one.
>
> I'm yet to benchmark whether the loss of that particular optimization flag
> negates the advantages of using znver3 as a core architecture target over a
> -x86-64 target in the first place.
> So I think I've solved my own problem (at least, it appears that way based
> on the checks).
> So the remaining question is, what method or package does the development
> team use (if any) for testing the speed of various base R calculations?
>
> That depends on the developer and the calculations, and on your goals -
> what you want to measure or show. I don't have a simple advice. If you are
> considering this for your own work, I'd recommend measuring some of your
> workloads. Also you can extrapolate from your workloads (from where time is
> spent in them) what would be a relevant benchmark. For example, if most
> time is spent in BLAS, then it is about finding a good optimized
> implementation (and for that checking the impact of the optimizations).
> Similarly, if it is some R package (base, recommended, or contributed), it
> may be using a computational kernel written in C or Fortran, something you
> could test separately or with a specific benchmark. I think it would be
> unlikely that CPU-specific C compiler optimizations would substantially
> speed up the R interpreter itself.
>
> For just deciding whether -fno-expensive-optimization negates the gains,
> you might look at some general computational/other benchmarks (not R). If
> it negated it even on benchmarks used by others to present the gains, then
> it probably is not worth it.
>
> One of the things I did in the past was looking at timings of selected
> CRAN packages (longer running examples, packages with most reverse
> dependencies) and then looking into the reasons for the individual bigger
> differences. That was when looking at the impacts of the byte-code
> compiler. Unlikely worth the effort in this case. Also, primarily, I think
> the bug should be traced down and fixed, wherever it is. Only then the
> measuring would make sense.
>
> Best
> Tomas
>
>
>
> best regards,
> Kieran
>
> On Wed, Apr 13, 2022 at 4:00 PM Tomas Kalibera <tomas.kalibera using gmail.com>
> wrote:
>
>> Hi Kieran,
>>
>> On 4/12/22 02:36, Kieran Short wrote:
>> > Hello,
>> >
>> > I'm new to this list, and have subscribed particularly because I've come
>> > across an issue with building R from source with an AMD-based Zen
>> > architecture under GCC11. Please don't attack me for my linux operating
>> > system choice, but it is Ubuntu 20.04 with Linux Kernel 5.10.102.1 -
>> > microsoft-standard-WSL2. I've built GCC11 using GCC8 (the standard GCC
>> > under Ubuntu20.04 WSL release), under Windows11 with wslg. WSL2/g runs
>> as a
>> > hypervisor with ports to all system resources including display, GPU
>> (cuda,
>> > etc).
>> >
>> > The reason why I am posting this email is that I am trying to compile R
>> > using the AMD Zen3 platform architecture rather than x86/64, because it
>> has
>> > processor-specific optimizations that improve performance over the
>> standard
>> > x86/64 in benchmarks. The Zen3 architecture optimizations are not
>> available
>> > in earlier versions of GCC (actually, they have possibly been
>> backported to
>> > GCC10 now). Since Ubuntu 20.04 doesn't have GCC11, I compiled the GCC11
>> > compiler using the native GCC8.
>> >
>> > The GCC11 I have built can build R 4.1.3 with a standard x86-64
>> > architecture and pass all tests with "make check-all".
>> > I configured that with:
>> >> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2
>> > CXXFLAGS="-O3 -march=x86-64" CFLAGS="-O3 -march=x86-64" FFLAGS="-O3
>> > -march=x86-64" --enable-memory-profiling --enable-R-shlib
>> > and built with
>> >> make -j 32 -O
>> >> make check-all
>> > ## PASS.
>> >
>> > So I can build R in my environment with GCC11.
>> > In configure, I am using references to "gcc-11.2" "gfortran-11.2" and
>> > "g++-11.2" because I compiled GCC11 compilers with these suffixes.
>> >
>> > Now, I'm using a 32 thread (16 core) AMD Zen3 CPU (a 5950x), and want to
>> > use it to its full potential. Zen3 optimizations are available as a
>> > -march=znver3 option n GCC11. The znver3 optimizations improve
>> performance
>> > in Phoronix Test Suite benchmarks (I'm not aware of anyone that has
>> > compiled R with them). See:
>> > https://www.phoronix.com/scan.php?page=article&item=amd-5950x-gcc11
>> >
>> > However, the R 4.1.3 build (made with "make -j 32 -O"), configured with
>> > -march=znver3, produces an R that fails "make check-all".
>> >
>> >> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2
>> > CXXFLAGS="-O2 -march=znver3" CFLAGS="-O2 -march=znver3" FFLAGS="-O2
>> > -march=znver3" --enable-memory-profiling --enable-R-shlib
>> > or
>> >> ~/R/R-4.1.3/configure CC=gcc-11.2 CXX=g++-11.2 FC=gfortran-11.2
>> > CXXFLAGS="-O3 -march=znver3" CFLAGS="-O3 -march=znver3" FFLAGS="-O3
>> > -march=znver3" --enable-memory-profiling --enable-R-shlib
>> >
>> > The fail is always in the factorizing.R Matrix.R tests, and in
>> particular,
>> > there are a number of errors and a fatal error.
>> > I have attached the output because I cannot really understand what is
>> going
>> > wrong. But results returned from matrix calculations are obviously odd
>> with
>> > -march=znver3 in GCC 11. There is another backwards-compatible
>> architecture
>> > option "znver2" and this has EXACTLY the same result.
>> >
>> > While there are other warrnings and errors (many in assert.EQ() ), the
>> > factorizing.R script continues. The fatal error (at line 2662 in the
>> > attached factorizing.Rout.fail text file) is:
>> >
>> >> ## problematic rank deficient rankMatrix() case -- only seen in large
>> > cases ??
>> >> Z. <- readRDS(system.file("external", "Z_NA_rnk.rds",
>> package="Matrix"))
>> >> tools::assertWarning(rnkZ. <- rankMatrix(Z., method = "qr")) # gave
>> errors
>> > Error in assertCondition(expr, classes, .exprString = d.expr) :
>> >    Failed to get warning in evaluating rnkZ. <- rankMatrix(Z., method
>> ...
>> > Calls: <Anonymous> -> assertCondition
>> > Execution halted
>> >
>> > Can anybody shed light on what might be going on here? 'make check-all'
>> > passes all the other checks. It is just factorizing.R in Matrix that
>> fails
>> > (other matrix tests run ok).
>> > Sorry this is a bit long-winded, but I thought details might be
>> important.
>>
>> R gets used and tested most with the default optimizations, without use
>> of model-specific instructions and with -O2 (GCC). It happens time to
>> time that some people try other optimization options and run into
>> problems. In principle, there are these cases (seen before):
>>
>> (1) the test in R package (or R) is wrong - it (unintentionally) expects
>> behavior which has been observed in builds with default optimizations,
>> but is not necessarily the only correct one; in case of numerical
>> tolerances set empirically, they could simply be too tight
>>
>> (2) the algorithm in R package or R has a bug - the result is really
>> wrong and it is because the algorithm is (unintentionally) not portable
>> enough, it (unintentionally) only works with default optimizations or
>> lower; in case of numerical results, this can be because it expects more
>> precision from the floating point computations than mandated by IEEE, or
>> assumes behavior not mandated
>>
>> (3) the optimization by design violates some properties the algorithm
>> knowingly depends on; with numerical computations, this can be a sort of
>> "fast" (and similarly referred to) mode which violates IEEE floating
>> point standard by design, in the aim of better performance; due to the
>> nature of the algorithm depending on IEEE, and poor luck, the results
>> end up completely wrong
>>
>> (4) there is a bug in the C or Fortran compiler (GCC as we use GCC) that
>> only exhibits with the unusual optimizations; the compiler produces
>> wrong code
>>
>> So, when you run into a problem like this and want to get that fixed,
>> the first thing is to identify which case of the above it is, in case of
>> 1 and 2 also differentiate between base R and a package (and which
>> concrete package). Different people maintain these things and you would
>> ideally narrow down the problem to a very small, isolated, reproducible
>> example to support your claim where the bug is. If you do this right,
>> the problem can often get fixed very fast.
>>
>> Such an example for (1) could be: few lines of standalone R code using
>> Matrix that produces correct results, but the test is not happy. With
>> pointers to the real check in the tests that is wrong. And an
>> explanation why the result is wrong.
>>
>> For (2)-(4) it would be a minimal standalone C/Fortran example including
>> only the critical function/part of algorithm that is not correct/not
>> portable/not compiled correctly, with results obtained with
>> optimizations where it works and where it doesn't. Unless you find an
>> obvious bug in R easy to explain (2), when the example would not have to
>> be standalone. With such standalone C example, you could easily test the
>> results with different optimizations and compilers, it is easier to
>> analyze, and easier to produce a bug report for GCC. What would make it
>> harder in this case is that it needs special hardware, but you could
>> still try with the example, and worry about that later (one option is
>> running in an emulator, and again a standalone example really helps
>> here). In principle, as it needs special hardware, the chances someone
>> else would do this work is smaller. Indeed, if it turns out to be (3),
>> it is unlikely to get resolved, but at least would get isolated (you
>> would know what not to run).
>>
>> As a user, if you run into a problem like this and do not want to get it
>> fixed, but just work it around somehow. First, it may be dangerous,
>> possibly one would get incorrect results from computations, but say in
>> applications where they are verified externally. You could try disabling
>> individual specific optimization until the tests pass. You could try
>> with later versions of gcc-11 (even unreleased) or gcc-12. Still, a lot
>> of this is easier with a small example, too. You could ignore the
>> failing test. And it may not be worth it - it may be that you could get
>> your speedups in a different, but more reliable way.
>>
>> Using wsl2 on its own should not necessarily be a problem and the way
>> you built gcc from the description should be ok, but at some point it
>> would be worth checking under Linux and running natively - because even
>> if these are numerical differences, they could be in principle caused by
>> running on Windows (or in wsl2), at least in the past such differences
>> were seen (related to (2) above). I would recommend checking on Linux
>> natively once you have at least a standalone R example.
>>
>> Best
>> Tomas
>>
>>
>> >
>> > best regards,
>> > Kieran
>> > ______________________________________________
>> > R-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

	[[alternative HTML version deleted]]