[Rd] Adressing Problems: R with Fortran and OpenMP

Mon Aug 8 18:44:51 CEST 2011

Hello,

I am programming an R program with nested Fortran calls for
calculations and OpenMP for parallelization. I am getting a changing
error corresponding to memory addressing problems, when using a 64-bit
system. Using a 32-bit System the application runs without problems.
The errors on 64-bit range from null-pointer failures, over
segmentation faults, over stack imbalances (changing differences and I
am not using C/C++) to finishing without exception but with wrong
values. Sometimes it even works correctly on 64-bit, mostly when
executing a second time within the same R session. Sometimes an
endless loop "Error: bad target context--should NEVER happen; please
bug.report() [R_run_onexits]" appears.

The problem seems to be platform independent. I have tried windows 7,
windows vista and open suse 11.3. (x86-64). Evaluation with valgrid
reveals a major possible memory leak, though the leak appears on
32-bit systems as well, just no errors. I am using a gfortran 4.5.0
x86-64 compiler and R version 2.12.

valgrid log extract:
==22989== 25,559,200 bytes in 4 blocks are possibly lost in loss
record 5,678 of 5,678
==22989==    at 0x4C26C3A: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==22989==    by 0x4F39907: Rf_allocVector (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4EDAF96: duplicate1 (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4FC204E: R_subassign3_dflt (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4FC24A2: do_subassign3 (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4F00B4A: Rf_eval (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4F0346F: do_set (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4F00B4A: Rf_eval (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4F02EB1: applydefine (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4F00B4A: Rf_eval (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4F035EB: do_begin (in /usr/lib64/R/lib/libR.so)
==22989==    by 0x4F00B4A: Rf_eval (in /usr/lib64/R/lib/libR.so)
==22989==
==22989== LEAK SUMMARY:
==22989==    definitely lost: 82 bytes in 1 blocks
==22989==    indirectly lost: 0 bytes in 0 blocks
==22989==      possibly lost: 109,720,966 bytes in 26,330 blocks
==22989==    still reachable: 23,101,045 bytes in 5,105 blocks
==22989==         suppressed: 0 bytes in 0 blocks

All pointers in Fortran are explicitly defined with integer*4 and
real*8 as double.

I am really lost in this, because i just dont know where to start and
stop looking. It is obvious to me, that there is some kind of memory
adressing problem related to 64-bit architecture but since I dont know
if its related to R or Fortran or OpenMp or a combination of those, it
is very hard to find. Also the program is part of a library with 40+
files which interact, so I it would be really hard and time consuming
to cut the program down to a size, where the error will be reproduced
and still managable.

Any help, ideas, suggestions as to what to do, where to look and what
to try would be very welcome. I have been trying to solve this problem
for nearly two weeks and read everything I could find regarding
x86-64, R, Fortran, OpenMP and memory issues. I could post more and
more specific information regarding the errors, but then the
description would get even bigger. So if I need to supply more
information, please tell me and I will do so.

Regards
Lars

Following are the code snippets for the Fortran call and the entrance
to the Fortran program with OpenMp definition. If the program fails
with an statement about where it failed (i.e. segmentation fault),
then it gives this call as place. But since I only get R errors and
not Fortran errors, the error might actually occur anywhere in
Fortran.

 z <- .Fortran("nlrdtirg",
                as.integer(si),
                as.integer(ngrad),
                as.integer(ddim[1]),
                as.integer(ddim[2]),
                as.integer(ddim[3]),
                as.logical(mask),
                as.double(object at btb),
                as.double(sdcoef),
                th0=as.double(s0),
                D=double(6*prod(ddim)),
                as.integer(200),
                as.double(1e-6),
                res=double(ngrad*prod(ddim)),
                rss=double(prod(ddim)),
                double(ngrad*num_threads),
				as.integer(num_threads),
                PACKAGE="dti",DUP=TRUE)

     subroutine nlrdtirg(s,nb,n1,n2,n3,mask,b,sdcoef,th0,D,niter,eps,
     1                    res,rss,varinv,nt)

      use omp_lib
      implicit logical*4 (a-z)
      integer*4 nb,n1,n2,n3,s(nb,n1,n2,n3),niter,nt,tid
      logical mask(n1,n2,n3)
      real*8 D(6,n1,n2,n3),b(6,nb),res(nb,n1,n2,n3),
     1    th0(n1,n2,n3),eps,rss(n1,n2,n3),sdcoef(4),varinv(nt*nb)
      integer*4 i1,i2,i3,j

      DO i3=1,n3
         DO i2=1,n2
C$OMP PARALLEL DEFAULT(NONE)
C$OMP& SHARED(mask,s,b,sdcoef,th0,D,res,rss,varinv,nb,niter,eps)
C$OMP& FIRSTPRIVATE(i2,i3,n1)
C$OMP& PRIVATE(i1,j,tid)
C$OMP DO SCHEDULE(DYNAMIC,1)