[Rd] R 4.0.2 64-bit Windows hangs

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Wed Sep 16 17:27:26 CEST 2020


On 8/27/20 8:38 PM, Jeroen Ooms wrote:
> On Wed, Aug 26, 2020 at 7:54 PM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
>> On 8/25/20 6:14 PM, Tomas Kalibera wrote:
>>> On 8/22/20 9:33 PM, Jeroen Ooms wrote:
>>>> On Sat, Aug 22, 2020 at 9:10 PM Tomas Kalibera
>>>> <tomas.kalibera using gmail.com> wrote:
>>>>> On 8/22/20 8:26 PM, Tomas Kalibera wrote:
>>>>>> On 8/22/20 7:58 PM, Jeroen Ooms wrote:
>>>>>>> On Sat, Aug 22, 2020 at 8:39 AM Tomas Kalibera
>>>>>>> <tomas.kalibera using gmail.com> wrote:
>>>>>>>> On 8/21/20 11:45 PM, m19tdn+9alxwj7d2bmk--- via R-devel wrote:
>>>>>>>>> Ah yes, this is related. I reported v2010 below, but it looks like
>>>>>>>>> I was updated to this Insider Build overnight without my knowledge,
>>>>>>>>> and conflated it with the new installation R v4 this morning.
>>>>>>>>>
>>>>>>>>> I will continue to look into the issue with the methods Tomas
>>>>>>>>> mentioned.
>>>>>>>> It is interesting that a rare 5 years old problem would re-appear on
>>>>>>>> current Insider builds. Which build of Windows are you running
>>>>>>>> exactly?
>>>>>>>> I've seen another report about a crash on 20190.1000. It'd be
>>>>>>>> nice to
>>>>>>>> know if it is present also in newer builds, i.e. in 20197.
>>>>>>> I installed the latest 20197 build in a vm, and I can indeed
>>>>>>> reproduce
>>>>>>> this problem.
>>>>>>>
>>>>>>> What seems to be happening is that R triggers an infinite
>>>>>>> recursion in
>>>>>>> Windows unwinding mechanism, and eventually dies with a stack
>>>>>>> overflow. Attached a backtrace of the initial 100 frames of the main
>>>>>>> thread (the pattern in the top ~30 frames continues forever).
>>>>>>>
>>>>>>> The microsoft blog doesn't mention anything related to exception
>>>>>>> handling has changed in recent versions:
>>>>>>> https://docs.microsoft.com/en-us/windows-insider/at-home/active-dev-branch
>>>>>>>
>>>>>>>
>>>>>> Thanks, unfortunately that does not ring any bells (except below), I
>>>>>> can't guess from this what is the underlying cause of the problem.
>>>>>> There may be something wrong in how we use setjmp/longjmp or how
>>>>>> setjmp/longjmp works on Windows.
>>>>>>
>>>>>> It reminds me of a problem I've been debugging few days ago, when
>>>>>> longjump implementation segfaults on Windows 10 (recent but not
>>>>>> Insider build) probably soon after unwinding the stack, but only with
>>>>>> GCC 10 / MinGW 7 and only in one of the no-segfault tests, and only
>>>>>> with -03 (not -O2, not with with -O3 -fno-split-loops). The problem
>>>>>> was sensitive to these optimization options interestingly on the call
>>>>>> site of long jump (do_abs), even when it was not an immediate caller
>>>>>> of the longjump. I've not tracked this down yet, it will require
>>>>>> looking at the assembly level, and I was suspecting a compiler error
>>>>>> causing the compiler to generate code that messes with the stack or
>>>>>> registers in a way that impacts the upcoming jump. But now as we have
>>>>>> this other problem with setjmp/logjmp, the compiler may not be the top
>>>>>> suspect anymore.
>>>>>>
>>>>>> I may not be able to work on this in the next few days or a week, so
>>>>>> if anyone gets there first, please let me know what you find out.
>>>>> Btw could you please try out if the UCRT build of R crashes as well in
>>>>> the Insider Windows build ?
>>>> Yes, it hangs in exactly the same way, except that the backtrace shows
>>>>
>>>>    ucrtbase!.intrinsic_setjmpex () from C:\WINDOWS\System32\ucrtbase.dll
>>>>
>>>> Instead of msvcrt!_setjmpex (as expected of course).
>>> Thanks. I found what is causing the problem I observed with
>>> GCC10/stock Windows 10, I expect this is the same one as in the
>>> Insider build.
>>> I will investigate further,
>>>
>>> Tomas
>>>
>> It seems the problem is between MinGW-W64 and Windows, and really it
>> causes both the reported crashes in an Insider build (I tested in 20197)
>> and in my GCC 10 builds in a single "no-segfault" test. setjmp is
>> implemented using Windows call _setjmpex, which has a second argument
>> argument, which is set differently by MinGW based on GCC version. When I
>> set this argument as MinGW-W64 did on early versions of GCC,
>> mingw_getsp(), it fixes/hides the problems on my systems. Perl5 uses a
>> similar workaround, but otherwise there is no solid base (documentation,
>> specification, etc) I am aware of for this change, so this may take some
>> more time to be properly fixed. Still, if anyone experiments with this
>> workaround and finds a problem, please let me know. In particular, I am
>> curious whether it works on earlier versions of Windows (at least with
>> check-all, including recommended packages).
> FYI, the problem has disappeared on Windows dev built 20201 (released
> yesterday), so it may have been a Windows bug. That is not to say
> there is no bug on the R/mingw side, but at least the current and past
> releases of R are working again on the latest versions of Windows,
> which is a big relief.

I've added a workaround, for now only to R-devel, which fixes both issues:

- infinite recursion on startup in 20197 (and some other pre-releases, 
as reported by others)
- segfault during longjump with gcc10 in multiple versions of Windows 
10, including 20211

The workaround uses NULL as the second argument to _setjmpex, which 
effectively disables SEH in internal R code for jump targets created 
using R's setjmp. This provides the same behavior as we have on Linux, 
potentially improves performance, and most importantly makes the problem 
go away. I've tested on CRAN/BIOC packages and did not find any issues, 
but potentially this could uncover bugs related to improper use of C++ 
with R (relying on that C++ destructors are run on R errors/long jumps). 
Such bugs should, however, have already been found on Linux where 
destructors were never run on long jumps.

Tomas



More information about the R-devel mailing list