[Rd] Crash/bug when calling match on latin1 strings

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Oct 11 11:31:53 CEST 2021


>>>>> Rui Barradas 
>>>>>     on Mon, 11 Oct 2021 07:41:51 +0100 writes:

    > Hello,

    > R 4.1.1 on Ubuntu 20.04.

    > I can reproduce this error but not ~90% of the time, only the 1st time I 
    > run the script.
    > If I run other (terminal) commands before rerunning the R script it 
    > sometimes segfaults again but once again very far from 90% of the time.


    > rui using rui:~/tmp$ R -q -f rhelp.R
    >> sessionInfo()
    > R version 4.1.1 (2021-08-10)
    > Platform: x86_64-pc-linux-gnu (64-bit)
    > Running under: Ubuntu 20.04.3 LTS

    > Matrix products: default
    > BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
    > LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

    > locale:
    > [1] LC_CTYPE=pt_PT.UTF-8       LC_NUMERIC=C
    > [3] LC_TIME=pt_PT.UTF-8        LC_COLLATE=pt_PT.UTF-8
    > [5] LC_MONETARY=pt_PT.UTF-8    LC_MESSAGES=pt_PT.UTF-8
    > [7] LC_PAPER=pt_PT.UTF-8       LC_NAME=C
    > [9] LC_ADDRESS=C               LC_TELEPHONE=C
    > [11] LC_MEASUREMENT=pt_PT.UTF-8 LC_IDENTIFICATION=C

    > attached base packages:
    > [1] stats     graphics  grDevices utils     datasets  methods   base

    > loaded via a namespace (and not attached):
    > [1] compiler_4.1.1
    >> 
    >> # A bunch of words in UTF8; replace *'s
    >> words <- readLines("h****://pastebin.c**/raw/MFCQfhpY", encoding = 
    > "UTF-8")
    >> words2 <- iconv(words, "utf-8", "latin1")
    >> gctorture(TRUE)
    >> y <- match(words2, words2)

    > *** caught segfault ***
    > address 0x10, cause 'memory not mapped'
    > *** recursive gc invocation
    > *** recursive gc invocation
    > *** recursive gc invocation
    > *** recursive gc invocation
    > *** recursive gc invocation
    > *** recursive gc invocation
    > *** recursive gc invocation
    > *** recursive gc invocation
    > *** recursive gc invocation
    > *** recursive gc invocation

    > Traceback:
    > 1: match(words2, words2)
    > An irrecoverable exception occurred. R is aborting now ...
    > Falta de segmentação (núcleo despejado)



    > This last line is Portuguese for

    > Segmentation fault (core dumped)

    > Hope this helps,

Yes, it does, thank you!

I can confirm the problem:  Only in R 4.1.0 and newer, and
including current "R-patched" and "R-devel" versions.

I've now turned this into a formal R bug report on R's bugzilla,
and (slightly) extended your (Travers') example into self
contained (no internet access) R script.

Bugzilla PR#18211 :    " match(<latin1>) memory corruption "

  https://bugs.r-project.org/show_bug.cgi?id=18211

  with attachment 2929
  --> https://bugs.r-project.org/attachment.cgi?id=2929&action=edit

==> please if possible follow up on bugzilla

Thanks again to you both!
Martin Maechler


    > Rui Barradas

    > Às 06:05 de 11/10/21, Travers Ching escreveu:
    >> Here's a brief example:
    >> 
    >> # A bunch of words in UTF8; replace *'s
    >> words <- readLines("h****://pastebin.c**/raw/MFCQfhpY", encoding = "UTF-8")
    >> words2 <- iconv(words, "utf-8", "latin1")
    >> gctorture(TRUE)
    >> y <- match(words2, words2)
    >> <The program crashes / segfaults ~90% of the time>
    >> 
    >> I searched bugzilla but didn't see anything. Apologies if this is already
    >> reported.
    >> 
    >> The bug appears in both R-devel and the release, but doesn't seem to affect
    >> R 4.0.5.



More information about the R-devel mailing list