[Rd] memory leak in sub("[range]",...)

Martin Maechler maechler at stat.math.ethz.ch
Tue Jul 15 10:11:06 CEST 2008


>>>>> "BD" == Bill Dunlap <bill at insightful.com>
>>>>>     on Wed, 9 Jul 2008 11:26:50 -0700 (PDT) writes:

    BD> There is a 2-block memory leak in the sub() (or any other regex-related
    BD> function, probably) when the pattern argument involves a range
    BD> expression, e.g., '[0-9]'.

    BD> % R --debugger=valgrind --debugger-args=--leak-check=full --vanilla
    BD> ==14519== Memcheck, a memory error detector.
    BD> ==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
    BD> ==14519== Using LibVEX rev 1658, a library for dynamic binary translation.
    BD> ==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
    BD> ==14519== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
    BD> ==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
    BD> ==14519== For more details, rerun with: -v
    BD> ==14519==

    BD> R version 2.8.0 Under development (unstable) (2008-07-07 r46046)
    BD> ...
    >> for(i in 1:1000)sub("[a-c]","+","0abcd")
    >> q()
    BD> ==32503==
    BD> ==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2)
    BD> ==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks.
    BD> ==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes
    BD> allocated.
    BD> ==32503== For counts of detected errors, rerun with: -v
    BD> ==32503== searching for pointers to 7,915 not-freed blocks.
    BD> ==32503== checked 12,616,568 bytes.
    BD> ==32503==
    BD> ==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45
    BD> ==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
    BD> ==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
    BD> ==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
    BD> ==32503==    by 0x80A614F: parse_branch (regex.c:4707)
    BD> ==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
    BD> ==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
    BD> ==32503==    by 0x8110CB4: do_gsub (character.c:1355)
    BD> ==32503==    by 0x80654A4: do_internal (names.c:1135)
    BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
    BD> ==32503==    by 0x8160DA7: do_begin (eval.c:1174)
    BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
    BD> ==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)
    BD> ==32503==
    BD> ... ignore 85 byte/4 block leak in readline ...
    BD> ==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 36 of
    BD> 45
    BD> ==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
    BD> ==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
    BD> ==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
    BD> ==32503==    by 0x80A614F: parse_branch (regex.c:4707)
    BD> ==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
    BD> ==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
    BD> ==32503==    by 0x8110CB4: do_gsub (character.c:1355)
    BD> ==32503==    by 0x80654A4: do_internal (names.c:1135)
    BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
    BD> ==32503==    by 0x8160DA7: do_begin (eval.c:1174)
    BD> ==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
    BD> ==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)

    BD> The leaked blocks are allocated in iinternal_function build_range_exp() at
    BD> 5200             /* Use realloc since mbcset->range_starts and mbcset-> range_ends
    BD> 5201                are NULL if *range_alloc == 0.  */
    BD> 5202             new_array_start = re_realloc (mbcset->range_starts,
    BD> wchar_t,
    BD> 5203                                           new_nranges);
    BD> 5204             new_array_end = re_realloc (mbcset->range_ends, wchar_t,
    BD> 5205                                         new_nranges);
    BD> ...
    BD> 5210             mbcset->range_starts = new_array_start;
    BD> 5211             mbcset->range_ends = new_array_end;

    BD> This file, src/main/regex.c, contains a complicated mess of #ifdef's

((note that these were not

    BD> but range_starts and range_ends are defined and appear to be used
    BD> whether or not _LIBC is defined.  However, they are only freed if _LIBC
    BD> is defined.  In my setup (Linux, gcc 3.4.5) _LIBC is not defined so
    BD> they don't get freed.

Ok; this all makes sense; I've seen the same in the source

Interestingly, my newer setup (Linux, gcc 4.2.x ...) does not show the
memory leak; I've not checked if it's because _LIBC is defined
or for another reason.

I'm applying your patch ---  thank you, Bill.
Martin

    BD> After the following change in free_charset() only the 85 byte/4 block
    BD> leak in readline remains.

    BD> Index: regex.c
    BD> ===================================================================
    BD> --- regex.c     (revision 46046)
    BD> +++ regex.c     (working copy)
    BD> @@ -6240,9 +6240,9 @@
    BD> # ifdef _LIBC
    BD> re_free (cset->coll_syms);
    BD> re_free (cset->equiv_classes);
    BD> +# endif
    BD> re_free (cset->range_starts);
    BD> re_free (cset->range_ends);
    BD> -# endif
    BD> re_free (cset->char_classes);
    BD> re_free (cset);
    BD> }

    BD> [This report may be a duplicate: I tried submitting it via the form in
    BD> http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.]

neither do I.
The machine running the repository had a (announce by Peter
Dalgaard) downtime a couple of days ago, so this may be related.


    BD> ----------------------------------------------------------------------------
    BD> Bill Dunlap
    BD> Insightful Corporation
    BD> bill at insightful dot com



More information about the R-devel mailing list