[Rd] Infrequent but steady NULL-pointer caused segfault in as.POSIXlt.POSIXct (R 3.4.4)

Sun Yijiang @uny|j|@ng @end|ng |rom gm@||@com
Fri Aug 2 10:23:00 CEST 2019


The R script I run daily for hours looks like this:

while (!finish) {
    Sys.sleep(0.1)
    time = as.integer(format(Sys.time(), "%H%M")) # always crash here
    if (new.data.timestamp() <= time)
        next
    # ... do some jobs for about 2 minutes ...
    gc()
}

Basically it waits for new data, which comes in every 10 minutes, and
do some jobs, then gc(), then loop again.  It works great most of the
time, but crashes strangely once a month or so.  Although infrequent,
it always crashes at the same place and gives the same error info,
like this:

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: as.POSIXlt.POSIXct(x, tz)
 2: as.POSIXlt(x, tz)
 3: format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...)
 4: structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...),
  names = names(x))
 5: format.POSIXct(Sys.time(), format = "%H%M")
 6: format(Sys.time(), format = "%H%M")
 7: format(Sys.time(), format = "%H%M")
… …

I looked into the dumped core with gdb, and found something very strange:

gdb /usr/lib64/R/bin/exec/R ~/core.30387
(gdb) bt 5
#0  0x00007f1dca844ff1 in __strlen_sse2_pminub () from /lib64/libc.so.6
#1  0x00007f1dcb20e8f9 in Rf_mkChar (name=0x0) at envir.c:3725
#2  0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>,
op=<optimized out>, args=<optimized out>,
    env=<optimized out>) at datetime.c:705
#3  0x00007f1dcb22197f in bcEval (body=body using entry=0x4064b28,
rho=rho using entry=0xc449d38, useCache=useCache using entry=TRUE)
    at eval.c:6473
#4  0x00007f1dcb230370 in Rf_eval (e=0x4064b28,
rho=rho using entry=0xc449d38) at eval.c:624
(More stack frames follow…)

Tracing into src/main/datetime.c:705, it’s a simple string-making code:
SET_STRING_ELT(tzone, 1, mkChar(R_tzname[0]));

mkChar function is defined in envir.c:3725:
3723  SEXP mkChar(const char *name)
3724  {
3725      size_t len =  strlen(name);
… …

gdb shows that the string pointer (name=0x0) mkChar received is NULL,
and subsequently strlen(NULL) caused the segfault.  But quite
contradictorily, gdb shows the value passed to mkChar in the caller is
valid:

(gdb) frame 2
#2  0x00007f1dcb1dc225 in do_asPOSIXlt (call=<optimized out>,
op=<optimized out>, args=<optimized out>,
    env=<optimized out>) at datetime.c:705
705 datetime.c: No such file or directory.
(gdb) p tzname[0]
$1 = 0x4cf39c0 “CST”

R_tzname is an alias of tzname. (#define R_tzname tzname in the same file.)

At first, I suspect that some library may have messed up the memory
and accidentally zeroed tzname (a global variable).  But with this gdb
trace, it shows that tzname is good, only that the pointer passed to
mkChar magically changed to zero.  Like this:

mkChar(tzname[0])  // tzname[0] is “CST”, address 0x4cf39c
… …
SEXP mkChar(const char *name)  // name should be 0x4cf39c, but gdb shows 0x0
{
    size_t len =  strlen(name);  // segfault, as name is NULL
… …

The only theory I can think of so far is that, on calling mkChar, the
parameter passed on stack somehow got wiped out to zero by some buggy
code in R or library.  At a higher level, what I see is this:  If you
run format(Sys.time(), "%H%M”) a million times a day (together with
other codes of course), once in a month or so this simple line can
segfault.

I’m lost in this confusion, could someone please help me find the
right direction to further look into this problem?

Regards,
Steve



More information about the R-devel mailing list