[Rd] reference counting bug related to break and next in loops

William Dunlap wdunlap at tibco.com
Wed Jun 3 05:17:49 CEST 2009


One of our R users here just showed me the following problem while
investigating the return value of a while loop.  I added some
information
on a similar bug in for loops.  I think he was using 2.9.0
but I see the same problem on today's development version of 2.10.0
(svn 48703).

Should the semantics of while and for loops be changed slightly to avoid
the memory
buildup that fixing this to reflect the current docs would entail?  S+'s
loops return nothing useful - that change was made long ago to avoid
memory buildup resulting from semantics akin the R's present semantics.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

--------------------Forwarded (and edited) message
below-------------------------------------------------------------------
----------

 I think I have found another reference counting bug.

If you type in the following in R you get what I think is the wrong
result.

> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i =
i + 1; y}; q
 [1] 42 42 42 42 42 42 42 42  9 10

I had expected  [1] 42 42 42 42 42 42 42  8  9 10 which is what you get
if you add 0 to y in the last statement in the while loop:

> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break }; i =
i + 1; y + 0}; q
 [1] 42 42 42 42 42 42 42  8  9 10  

Also, 

> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n"); y};
q
Completing iteration 2
Completing iteration 3
 [1] 42 42 42 42 42 42 42 42  9 10

but if the last statement in the while loop is y+0 instead of y I get
the
expected result:

> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", i, "\n");
y+0L}; q
Completing iteration 2
Completing iteration 3
 [1] 42 42  3  4  5  6  7  8  9 10

A background to the problem is that in R a while-loop returns the value
of the last iteration. However there is an exception if an iteration is
terminated by a break or a next. Then the value is the value of the
previously completed iteration that did not execute a break or next.
Thus in an extreme case the value of the while may be the value of the
very first iteration even though it executed a million iterations. 

Thus to implement that correctly one needs to keep a reference to the
value of the last non-terminated iteration. It seems as if the current R
implementation does that but does not increase the reference counter
which explains the odd behavior.

The for loop example is

> z<-{ tmp<-rep(pi,10);for(i in 1:10){ tmp[i]<-i^2;if(i==9)break ; if
(i<9&&i>3)next ; tmp } }
> z
 [1]  1.000000  4.000000  9.000000 16.000000 25.000000 36.000000
49.000000
 [8] 64.000000 81.000000  3.141593
> z<-{ tmp<-rep(pi,10);for(i in 1:10){ tmp[i]<-i^2;if(i==9)break ; if
(i<9&&i>3)next ; tmp+0 } }
> z
 [1] 1.000000 4.000000 9.000000 3.141593 3.141593 3.141593 3.141593
3.141593
 [9] 3.141593 3.141593

I can think of a couple of ways to solve this.

1.       Increment the reference counter. This solves the bug but may
have serious performance implications. In the while example above it
needs to copy y in every iteration.

2.       Change the semantics of while loops by getting rid of the
exception described above. When a loop is terminated with a break the
value of the loop would be NULL. Thus there is no need to keep a
reference to the value of the last non-terminated iteration.

Any opinions?



More information about the R-devel mailing list