[Rd] memory leak in sub("[range]",...)

Bill Dunlap bill at insightful.com
Wed Jul 9 20:26:50 CEST 2008


There is a 2-block memory leak in the sub() (or any other regex-related
function, probably) when the pattern argument involves a range
expression, e.g., '[0-9]'.

% R --debugger=valgrind --debugger-args=--leak-check=full --vanilla
==14519== Memcheck, a memory error detector.
==14519== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
==14519== Using LibVEX rev 1658, a library for dynamic binary translation.
==14519== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
==14519== Using valgrind-3.2.1, a dynamic binary instrumentation framework.
==14519== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et al.
==14519== For more details, rerun with: -v
==14519==

R version 2.8.0 Under development (unstable) (2008-07-07 r46046)
...
> for(i in 1:1000)sub("[a-c]","+","0abcd")
> q()
==32503==
==32503== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 40 from 2)
==32503== malloc/free: in use at exit: 12,603,409 bytes in 7,915 blocks.
==32503== malloc/free: 61,973 allocs, 54,058 frees, 54,494,371 bytes
allocated.
==32503== For counts of detected errors, rerun with: -v
==32503== searching for pointers to 7,915 not-freed blocks.
==32503== checked 12,616,568 bytes.
==32503==
==32503== 4 bytes in 1 blocks are possibly lost in loss record 1 of 45
==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
==32503==    by 0x80A614F: parse_branch (regex.c:4707)
==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
==32503==    by 0x8110CB4: do_gsub (character.c:1355)
==32503==    by 0x80654A4: do_internal (names.c:1135)
==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
==32503==    by 0x8160DA7: do_begin (eval.c:1174)
==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)
==32503==
... ignore 85 byte/4 block leak in readline ...
==32503== 7,980 bytes in 1,995 blocks are definitely lost in loss record 36 of
45
==32503==    at 0x40046EE: malloc (vg_replace_malloc.c:149)
==32503==    by 0x4005B9A: realloc (vg_replace_malloc.c:306)
==32503==    by 0x80A5F92: parse_expression (regex.c:5202)
==32503==    by 0x80A614F: parse_branch (regex.c:4707)
==32503==    by 0x80A621A: parse_reg_exp (regex.c:4666)
==32503==    by 0x80A6618: Rf_regcomp (regex.c:4635)
==32503==    by 0x8110CB4: do_gsub (character.c:1355)
==32503==    by 0x80654A4: do_internal (names.c:1135)
==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
==32503==    by 0x8160DA7: do_begin (eval.c:1174)
==32503==    by 0x815F0EB: Rf_eval (eval.c:461)
==32503==    by 0x8162210: Rf_applyClosure (eval.c:667)

The leaked blocks are allocated in iinternal_function build_range_exp() at
   5200             /* Use realloc since mbcset->range_starts and
mbcset->range_ends
   5201                are NULL if *range_alloc == 0.  */
   5202             new_array_start = re_realloc (mbcset->range_starts,
wchar_t,
   5203                                           new_nranges);
   5204             new_array_end = re_realloc (mbcset->range_ends, wchar_t,
   5205                                         new_nranges);
...
   5210             mbcset->range_starts = new_array_start;
   5211             mbcset->range_ends = new_array_end;

This file, src/main/regex.c, contains a complicated mess of #ifdef's
but range_starts and range_ends are defined and appear to be used
whether or not _LIBC is defined.  However, they are only freed if _LIBC
is defined.  In my setup (Linux, gcc 3.4.5) _LIBC is not defined so
they don't get freed.

After the following change in free_charset() only the 85 byte/4 block
leak in readline remains.

Index: regex.c
===================================================================
--- regex.c     (revision 46046)
+++ regex.c     (working copy)
@@ -6240,9 +6240,9 @@
 # ifdef _LIBC
   re_free (cset->coll_syms);
   re_free (cset->equiv_classes);
+# endif
   re_free (cset->range_starts);
   re_free (cset->range_ends);
-# endif
   re_free (cset->char_classes);
   re_free (cset);
 }

[This report may be a duplicate: I tried submitting it via the form in
http://bugs.r-project.org/cgi-bin/R, but I cannot find it there now.]

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."



More information about the R-devel mailing list