can't find array overruns (was: help debugging segfaults)

Peter Dalgaard BSA p.dalgaard@biostat.ku.dk
18 Jun 2002 16:03:39 +0200


"Liaw, Andy" <andy_liaw@merck.com> writes:

> Dear R-devel,
> 
> Last week I got several responses to my question about debugging segfaults
> in my code (original post below).  After I changed the S_alloc() calls to
> Calloc()/Free(), the symptom was gone, but I was told to keep looking.  So I
> did:
> 
> o  Switched to Calloc/Free.  Electric Fence did not find any problem.
> 
> o  Put assert(index < bound); assert(index >=0); everywhere in the C routine
> where arrays are accessed.  Everything ran fine.  (I did not (don't really
> know easy way to) do the same thing for the Fortran subroutines (mostly
> Breiman's original code) called by the C function.
> 
> o Changed to malloc()/free().  Still didn't find anything with Electric
> Fence.
> 
> Can some one suggest how to proceed?  Is it still not save to assume the bug
> is gone?
> 
> Regards,
> Andy

The hardcore way is to use the original code and backtrack until you
find the source of the memory corruption. I.e. in your code below, it
seems that "s" got corrupted so that NEXT_NODE(s) triggers the
segfault. So

1. Find the exact memory location with the corrupted value. 
2. Set a hardware watchpoint on that location.
3. Rerun the program with well-defined input and check whenever the
   value at the watchpoint changes. 

Very likely, the culprit will be the last change prior to the crash,
so you'd have to check the program logic carefully around that point.
If it happens at an assignment to something seemingly unrelated,
chances are that you have an array overrun. If the location changes
frequently, it can be useful to conditionalize the watchpoint (the
value of number of garbage collections can be useful for this).

The precise way to do this kind of stuff is in your friendly gdb
manual... (sorry, but it would take all day to flesh out the details)

> > The randomForest package mainly consists of two things: rf.c 
> > contains rf(),
> > a C wrapper function that calls the Fortran subroutines in 
> > rfsub.f that do
> > most of the work (slightly altered from Breiman's original code).  All
> > memory allocations are done in rf.c, using S_alloc().  When I 
> > run random
> > forest with the data and setting as mentioned above, it was 
> > able to finish
> > growing the 7000 trees, but segfault when returning from rf() 
> > to R.  GDB
> > gave the following (gdb prompts removed):
> > 
> > do_dotCode (call=0x873aff4, op=0x8a5f620, args=0x8a5d010, 
> > env=0x86fd0a4)
> >     at dotcode.c:1413
> > 1413            break;
> > 1845        PROTECT(ans = allocVector(VECSXP, nargs));
> > 1846        havenames = 0;
> > 1847        if (dup) {
> > 1849            info.cargs = cargs;
> > 1850            info.allArgs = args;
> > 1851            info.nargs = nargs;
> > 1852            info.functionName = buf;
> > 1853            nargs = 0;
> > 1854            for (pargs = args ; pargs != R_NilValue ; pargs =
> > CDR(pargs)) {
> > 1855                if(argConverters[nargs]) {
> > 1864                    PROTECT(s = CPtrToRObj(cargs[nargs], 
> > CAR(pargs),
> > which));
> > 
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x080ddc6a in RunGenCollect (size_needed=1515400) at memory.c:1133
> > 1133                    SEXP next = NEXT_NODE(s);


-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._