[Rd] Infrequent but steady NULL-pointer caused segfault in as.POSIXlt.POSIXct (R 3.4.4)

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Fri Aug 2 10:59:02 CEST 2019


In an optimized build, debug info is just an approximation. It might 
help to debug in a build of R and packages without compiler 
optimizations (-O0), where the debug information is accurate. However, 
first I would try to modify the example to trigger more often, or try to 
find external ways to make it trigger more often (e.g. via gctorture). 
Then I would try to make the example smaller (not call gc() explicitly, 
not call any external code - e.g. the jobs, etc) - any time the example 
is reduced but still triggers the errors, the reasoning is made easier. 
Once you have a repeatable situation in a build with reliable debug 
symbols, debugging is easier too, e.g. sometimes a watchpoint helps to 
find memory corruption. Please feel free to ask more when you have more 
information/updates. If this ends up being a bug in R, please report 
(and with a reproducible example, if it is not obvious from the source 
code).

Best
Tomas


On 8/2/19 10:23 AM, Sun Yijiang wrote:
> The R script I run daily for hours looks like this:
>
> while (!finish) {
>      Sys.sleep(0.1)
>      time = as.integer(format(Sys.time(), "%H%M")) # always crash here
>      if (new.data.timestamp() <= time)
>          next
>      # ... do some jobs for about 2 minutes ...
>      gc()
> }
>
> Basically it waits for new data, which comes in every 10 minutes, and
> do some jobs, then gc(), then loop again.  It works great most of the
> time, but crashes strangely once a month or so.  Although infrequent,
> it always crashes at the same place and gives the same error info,
> like this:
>
>   *** caught segfault ***
> address (nil), cause 'memory not mapped'
>
> Traceback:
>   1: as.POSIXlt.POSIXct(x, tz)
>   2: as.POSIXlt(x, tz)
>   3: format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...)
>   4: structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...),
>    names = names(x))
>   5: format.POSIXct(Sys.time(), format = "%H%M")
>   6: format(Sys.time(), format = "%H%M")
>   7: format(Sys.time(), format = "%H%M")
> … …
>
> I looked into the dumped core with gdb, and found something very strange:
>
> gdb /usr/lib64/R/bin/exec/R ~/core.30387
> (gdb) bt 5
> #0  0x00007f1dca844ff1 in __strlen_sse2_pminub () from /lib64/libc.so.6
> #1  0x00007f1dcb20e8f9 in Rf_mkChar (name=0x0) at envir.c:3725
> #2  0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>,
> op=<optimized out>, args=<optimized out>,
>      env=<optimized out>) at datetime.c:705
> #3  0x00007f1dcb22197f in bcEval (body=body using entry=0x4064b28,
> rho=rho using entry=0xc449d38, useCache=useCache using entry=TRUE)
>      at eval.c:6473
> #4  0x00007f1dcb230370 in Rf_eval (e=0x4064b28,
> rho=rho using entry=0xc449d38) at eval.c:624
> (More stack frames follow…)
>
> Tracing into src/main/datetime.c:705, it’s a simple string-making code:
> SET_STRING_ELT(tzone, 1, mkChar(R_tzname[0]));
>
> mkChar function is defined in envir.c:3725:
> 3723  SEXP mkChar(const char *name)
> 3724  {
> 3725      size_t len =  strlen(name);
> … …
>
> gdb shows that the string pointer (name=0x0) mkChar received is NULL,
> and subsequently strlen(NULL) caused the segfault.  But quite
> contradictorily, gdb shows the value passed to mkChar in the caller is
> valid:
>
> (gdb) frame 2
> #2  0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>,
> op=<optimized out>, args=<optimized out>,
>      env=<optimized out>) at datetime.c:705
> 705 datetime.c: No such file or directory.
> (gdb) p tzname[0]
> $1 = 0x4cf39c0 “CST”
>
> R_tzname is an alias of tzname. (#define R_tzname tzname in the same file.)
>
> At first, I suspect that some library may have messed up the memory
> and accidentally zeroed tzname (a global variable).  But with this gdb
> trace, it shows that tzname is good, only that the pointer passed to
> mkChar magically changed to zero.  Like this:
>
> mkChar(tzname[0])  // tzname[0] is “CST”, address 0x4cf39c
> … …
> SEXP mkChar(const char *name)  // name should be 0x4cf39c, but gdb shows 0x0
> {
>      size_t len =  strlen(name);  // segfault, as name is NULL
> … …
>
> The only theory I can think of so far is that, on calling mkChar, the
> parameter passed on stack somehow got wiped out to zero by some buggy
> code in R or library.  At a higher level, what I see is this:  If you
> run format(Sys.time(), "%H%M”) a million times a day (together with
> other codes of course), once in a month or so this simple line can
> segfault.
>
> I’m lost in this confusion, could someone please help me find the
> right direction to further look into this problem?
>
> Regards,
> Steve
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list