[Rd] reference counting bug related to break and next in loops

William Dunlap wdunlap at tibco.com
Fri Jun 12 21:07:13 CEST 2009


> -----Original Message-----
> From: William Dunlap 
> Sent: Wednesday, June 10, 2009 2:35 PM
> To: 'luke at stat.uiowa.edu'
> Cc: r-devel at r-project.org
> Subject: RE: [Rd] reference counting bug related to break and 
> next in loops
> 
> Thanks Luke.
> 
> I used codetools::walkCode to look for functions that returned the
> value of a loop and found a surprising number in base R.
> However, it looks like non of these functions were written to return
> anything useful (the side effects were important), so the change
> to loop-returns-NULL should let such functions use less
> memory (and time) and make them correspond better to their
> documentation.

I also use walkCode to look for assignents whose right hand sides
were loops.  The only *.R file in a directory containing a current R
build that had such a thing was 
   ./library/base/R-ex/print.R
(from help(print)) and related files in ./tests.  It had
   rr <- for(i in 1:3) print(1:i)
   rr
That last line used to print
   [1] 1 2 3 
and now prints
   NULL
I didn't notice a corresponding *.Rout.save file so the test won't
notice the change in behavior.

I've attached the code I used.  It is pretty crude.
functionReturnsLoopValue("file")
and functionAssignsLoopValue("file") parse "file" and then return a list
of functions
that return the value of a loop and a list of assignments involving the
value of a loop,
respectively.  functionAssignsLoopValue looks for ordinary assignments
(x<-while(...))
and for default argument values (function(x=while(...)){}).  It does not
yet look for
things passed as arguments in function calls,  c(1, while(x>0)x<-x-1,
3).  The last
might pick up some false positives involving functions that don't
evaluate their
arguments.
 
> 
> E.g., fixup.libraries.URLs is not documented to return anything
> and it returns the value of for loop:
> 
> $`src/library/utils/R/windows/linkhtml.R`[[2]]
> function(lib.loc = .libPaths())
> {
>     for (lib in lib.loc) {
>         if(lib == .Library) next
>         pg <- sort(.packages(all.available = TRUE, lib.loc = lib))
>         for(pkg in pg)
>             try(fixup.package.URLs(file.path(lib,pkg)), silent = TRUE)
>     }
> }
> 
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com  
> 
> > -----Original Message-----
> > From: luke at stat.uiowa.edu [mailto:luke at stat.uiowa.edu] 
> > Sent: Wednesday, June 10, 2009 1:05 PM
> > To: William Dunlap
> > Cc: r-devel at r-project.org
> > Subject: Re: [Rd] reference counting bug related to break and 
> > next in loops
> > 
> > Thanks for the report.
> > 
> > It turns out that a similar issue arises in while() loops without
> > break/next being involved because the test expression is evaluated
> > after the final body evaluation.  After some discussion we 
> decided it
> > was simplest both for implementation and documentation to have the
> > value of a loop expression always be NULL.  This is now 
> implemented in
> > R-devel.
> > 
> > luke
> > 
> > On Tue, 2 Jun 2009, William Dunlap wrote:
> > 
> > > One of our R users here just showed me the following problem while
> > > investigating the return value of a while loop.  I added some
> > > information
> > > on a similar bug in for loops.  I think he was using 2.9.0
> > > but I see the same problem on today's development version 
> of 2.10.0
> > > (svn 48703).
> > >
> > > Should the semantics of while and for loops be changed 
> > slightly to avoid
> > > the memory
> > > buildup that fixing this to reflect the current docs would 
> > entail?  S+'s
> > > loops return nothing useful - that change was made long 
> ago to avoid
> > > memory buildup resulting from semantics akin the R's 
> > present semantics.
> > >
> > > Bill Dunlap
> > > TIBCO Software Inc - Spotfire Division
> > > wdunlap tibco.com
> > >
> > > --------------------Forwarded (and edited) message
> > > 
> > below---------------------------------------------------------
> > ----------
> > > ----------
> > >
> > > I think I have found another reference counting bug.
> > >
> > > If you type in the following in R you get what I think is 
> the wrong
> > > result.
> > >
> > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { 
> > break }; i =
> > > i + 1; y}; q
> > > [1] 42 42 42 42 42 42 42 42  9 10
> > >
> > > I had expected  [1] 42 42 42 42 42 42 42  8  9 10 which is 
> > what you get
> > > if you add 0 to y in the last statement in the while loop:
> > >
> > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { 
> > break }; i =
> > > i + 1; y + 0}; q
> > > [1] 42 42 42 42 42 42 42  8  9 10
> > >
> > > Also,
> > >
> > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
> > > i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", 
> > i, "\n"); y};
> > > q
> > > Completing iteration 2
> > > Completing iteration 3
> > > [1] 42 42 42 42 42 42 42 42  9 10
> > >
> > > but if the last statement in the while loop is y+0 instead 
> > of y I get
> > > the
> > > expected result:
> > >
> > >> i = 1; y = 1:10; q = while(T) { y[i] = 42; if (i == 8) { break };
> > > i<-i+1 ; if (i<=8&&i>3)next ; cat("Completing iteration", 
> i, "\n");
> > > y+0L}; q
> > > Completing iteration 2
> > > Completing iteration 3
> > > [1] 42 42  3  4  5  6  7  8  9 10
> > >
> > > A background to the problem is that in R a while-loop 
> > returns the value
> > > of the last iteration. However there is an exception if an 
> > iteration is
> > > terminated by a break or a next. Then the value is the 
> value of the
> > > previously completed iteration that did not execute a 
> break or next.
> > > Thus in an extreme case the value of the while may be the 
> > value of the
> > > very first iteration even though it executed a million iterations.
> > >
> > > Thus to implement that correctly one needs to keep a 
> > reference to the
> > > value of the last non-terminated iteration. It seems as if 
> > the current R
> > > implementation does that but does not increase the 
> reference counter
> > > which explains the odd behavior.
> > >
> > > The for loop example is
> > >
> > >> z<-{ tmp<-rep(pi,10);for(i in 1:10){ 
> tmp[i]<-i^2;if(i==9)break ; if
> > > (i<9&&i>3)next ; tmp } }
> > >> z
> > > [1]  1.000000  4.000000  9.000000 16.000000 25.000000 36.000000
> > > 49.000000
> > > [8] 64.000000 81.000000  3.141593
> > >> z<-{ tmp<-rep(pi,10);for(i in 1:10){ 
> tmp[i]<-i^2;if(i==9)break ; if
> > > (i<9&&i>3)next ; tmp+0 } }
> > >> z
> > > [1] 1.000000 4.000000 9.000000 3.141593 3.141593 3.141593 3.141593
> > > 3.141593
> > > [9] 3.141593 3.141593
> > >
> > > I can think of a couple of ways to solve this.
> > >
> > > 1.       Increment the reference counter. This solves the 
> > bug but may
> > > have serious performance implications. In the while 
> example above it
> > > needs to copy y in every iteration.
> > >
> > > 2.       Change the semantics of while loops by getting rid of the
> > > exception described above. When a loop is terminated with a 
> > break the
> > > value of the loop would be NULL. Thus there is no need to keep a
> > > reference to the value of the last non-terminated iteration.
> > >
> > > Any opinions?
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > 
> > -- 
> > Luke Tierney
> > Chair, Statistics and Actuarial Science
> > Ralph E. Wareham Professor of Mathematical Sciences
> > University of Iowa                  Phone:             319-335-3386
> > Department of Statistics and        Fax:               319-335-3017
> >     Actuarial Science
> > 241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
> > Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
> > 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: walk.R.txt
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20090612/b7ac8136/attachment.txt>


More information about the R-devel mailing list