[Rd] parse_Rd and/or lazyload problem

Mark.Bravington at csiro.au Mark.Bravington at csiro.au
Wed Nov 4 03:51:28 CET 2009


> Sorry.  What I thought you said was that you had spent several hours
> on it and didn't want to spend more time on it.  I've told you I
> don't want to work on it either.  
> 
> If there is no way to trigger this bug without using internals, then
> it has not been demonstrated to be a bug in R.  It might be one, or
> it might be a bug in your code.  Often I'll work on things that are
> demonstrated bugs, but I won't commit several hours to debugging your
> code.    
> 
> Duncan Murdoch
> 

I sympathize with not wanting to spend hours on other people's code-- and I appreciate that you have spent a lot of time off-list trying to help me with 'parse_Rd' recently.

But in this case: 

(i) there were only 3 lines of code in the first example! If I've done something wrong in those 3 lines, it shouldn't take several hours to diagnose...

(ii) the real problem may well not be in 'parse_Rd' but in 'lazyLoad' etc, as the subject line says. Presumably you picked up the original thread because you're the 'parse_Rd' author. If you're sure it's not 'parse_Rd', or if you don't want to look at the code for other reasons, perhaps you could alert the author of the lazyloading routines (Luke Tierney?) to see if he's willing to look into it.

(iii) I deliberately haven't submitted a formal bug report, because my reproducible examples need to call 'makeLazyLoadDB'. (Though Henrik B is able to trigger the same problem without it.) As you say, by R's definition of a bug (which  certainly isn't the same as mine) I cannot demonstrate this is a "bug". So the R-bug lens may not be the correct filter for you to apply here.

Further to the problem itself: Henrik Bengtsson's report seems symptomatic of the same thing. I've generally hit the bug (damn!) only on the second or subsequent time in a session that I've lazyloaded, which is one reason it's hard to make reproducible. If you want a reproducible example to help track the bug down, then my original 3-liner would be easier to work with. However, while that one does reliably trigger an error on my laptop with 2GB R-usable memory, it doesn't on my 4GB-usable desktop. For that machine, a reproducible sequence with the only internal function being 'makeLazyLoadDB' is: 

file.copy( 'd:/temp/Rdiff.Rd', 'd:/temp/scrunge.Rd') # Rdiff.Rd from 'tools' package source

eglist <- list( scrunge=parse_Rd(  'd:/temp/scrunge.Rd'))
tools:::makeLazyLoadDB( eglist, 'd:/temp/ll')
e <- new.env()
lazyLoad( 'd:/temp/ll', e)
as.list( e) # force; OK

eglist1 <- list( scrunge=parse_Rd(  'd:/temp/Rdiff.Rd'))
tools:::makeLazyLoadDB( eglist1, 'd:/temp/ll')
e <- new.env()
lazyLoad( 'd:/temp/ll', e)
as.list( e) # Splat

It doesn't make any difference which file I process first; the error comes the second time round.


Mark


-- 
Mark Bravington
CSIRO Mathematical & Information Sciences
Marine Laboratory
Castray Esplanade
Hobart 7001
TAS

ph (+61) 3 6232 5118
fax (+61) 3 6232 5012
mob (+61) 438 315 623

Duncan Murdoch wrote:
> On 01/11/2009 3:12 PM, Mark.Bravington at csiro.au wrote:
>>> Okay, then we both agree we should drop it.
>>> Duncan Murdoch
>> 
>> 
>> No we don't. I can't provide a functioning mvbutils, or debug, until
>> this is resolved. 
>> 
>> I am trying to be a good citizen and prepare reproducible bug
>> reports-- e.g. the 3 line example. It would be quicker for me to
>> write some ugly hack that modifies base R and gets round the problem
>> *for me*, but that doesn't seem the best outcome for R. A culture
>> which discourages careful bug reporting is unhealthy culture.    
> 
> Sorry.  What I thought you said was that you had spent several hours
> on it and didn't want to spend more time on it.  I've told you I
> don't want to work on it either.  
> 
> If there is no way to trigger this bug without using internals, then
> it has not been demonstrated to be a bug in R.  It might be one, or
> it might be a bug in your code.  Often I'll work on things that are
> demonstrated bugs, but I won't commit several hours to debugging your
> code.    
> 
> Duncan Murdoch
> 
>> Mark Bravington
>> 
>> 
>> ________________________________________
>> From: Duncan Murdoch [murdoch at stats.uwo.ca]
>> Sent: 02 November 2009 01:08
>> To: Bravington, Mark (CMIS, Hobart)
>> Cc: r-devel at r-project.org
>> Subject: Re: [Rd] parse_Rd and/or lazyload problem
>> 
>> On 31/10/2009 10:18 PM, Mark.Bravington at csiro.au wrote:
>>>> Does this happen in R-patched?  I've seen similar errors in 2.10.0,
>>>> but not in a current build.
>>> Yes, still there in R-patched.
>>> 
>>> (Still haven't got to your code, this was in
>>>> mine.  I'm reluctant to spend time on code that is messing with
>>>> internals, because you might be using things in a way not intended
>>>> by the author.  Now, if you can show me some code that demonstrates
>>>> the problem without using internals directly, I'll follow up.)
>>> I did try, but it's not completely possible, because
>>> 'makeLazyLoadDB' is internal and there is no public alternative (a
>>> pity-- it's useful). However, the problem(s) can be demonstrated
>>> without directly calling 'parse_Rd', and with 'lazyLoad' (public)
>>> instead of 'fetchRdDB' (private), as per "pointer 1" below. If you
>>> have a look at 'tools:::.install_package_Rd_objects', you'll see
>>> that my use of 'makeLazyLoadDB' is quite standard.      
>>> 
>>> The problem is not easy to reproduce. It took 4-5 hours work to get
>>> the 3-line reproducible example that I posted, plus another couple
>>> since, so I'm also reluctant to spend more time...  
>> 
>> Okay, then we both agree we should drop it.
>> Duncan Murdoch
>> 
>> 
>>> The examples in my previous post still apply-- the first one
>>> involves just 3 statements-- but here are some more pointers I've
>>> unearthed since:  
>>> 
>>> 
>>> 1. Sometimes 'fetchRdDB' or 'lazyLoad' called directly from the
>>> prompt doesn't work, but public 'Rd_db' (which directly calls
>>> 'fetchRdDB') does. I've experimented with copying the installed
>>> 'tools' package into a new library "d:/temp/fakelib", then stuff
>>> like this:    
>>> 
>>> test> e <- new.env()
>>> test> lazyLoad( 'd:/temp/fakelib/tools/help/tools', e) # original
>>> test> files tools.rdx, tools.rdb e <- as.list( e) # force evaluation
>>> test> tools:::makeLazyLoadDB( e, 'd:/temp/fakelib/tools/help/tools')
>>> test> # modify tools.rd*
>>> test> e1 <- new.env()
>>> test> lazyLoad( 'd:/temp/fakelib/tools/help/tools', e1) as.list( e1)
>>> test> # try to force evaluation...
>>> Error in as.list.environment(e1) :
>>>   cannot allocate memory block of size 2.7 Gb
>>> test>
>>> test> Rd_db( 'tools', 'd:/temp/fakelib') # no probs !?
>>> 
>>> 
>>> 2. Sometimes 'fetchRdDB' or 'lazyLoad' will fail in one R session,
>>> but will work in a fresh session on exactly the same files. For
>>> example, after restarting R, the previous commands involving 'e1'
>>> work fine.   
>>> 
>>> Mark
>>> 
>>>> Duncan Murdoch
>>>> 
>>>>>> I'm encountering problems when making lazy-loadable databases of
>>>>>> the output from 'parse_Rd'. The lazy- load database is of
>>>>>> seemingly limitless size when I try to reload it... Admittedly I
>>>>>> am using functions that I'm not really supposed to use, which is
>>>>>> why this isn't a bug report, but there does seem to be something
>>>>>> strange going on; my code is very similar to code that lives
>>>>>> inside 'tools:::.install_package_Rd_objects'. The problems occur
>>>>>> with just-released R2.10.0 on Windows.   
>>>>> object.size() has problems when working on Rd objects, because it
>>>>> counts every environment separately, even though they may all be
>>>>> references to the same one. I haven't looked at your code, but
>>>>> that could be a problem. 
>>>>> 
>>>>> 
>>>> Mark
>>>> 
>>>> ________________________________________
>>>> From: Duncan Murdoch [murdoch at stats.uwo.ca]
>>>> Sent: 31 October 2009 08:59
>>>> To: Bravington, Mark (CMIS, Hobart)
>>>> Cc: r-devel at r-project.org
>>>> Subject: Re: [Rd] parse_Rd and/or lazyload problem
>>>> 
>>>> On 30/10/2009 12:50 AM, Mark.Bravington at csiro.au wrote:
>>>>> I'm encountering problems when making lazy-loadable databases of
>>>>> the output from 'parse_Rd'. The lazy-load database is of
>>>>> seemingly limitless size when I try to reload it... Admittedly I
>>>>> am using functions that I'm not really supposed to use, which is
>>>>> why this isn't a bug report, but there does seem to be something
>>>>> strange going on; my code is very similar to code that lives
>>>>> inside 'tools:::.install_package_Rd_objects'. The problems occur
>>>>> with just-released R2.10.0 on Windows.       
>>>> object.size() has problems when working on Rd objects, because it
>>>> counts every environment separately, even though they may all be
>>>> references to the same one. I haven't looked at your code, but
>>>> that could be a problem. 
>>>> 
>>>> Duncan Murdoch
>>>> 
>>>>> The examples below use files which can be found at
>>>>> ftp://ftp.csiro.au/MarkBravington, but you'll obviously need to
>>>>> modify the paths. The file "scrunge.Rd" is just "Rdiff.Rd" from
>>>>> the 'tools' package. The file "fakepack.7z" should unzip to
>>>>> create a fake package with a DESCRIPTION file and a "man"
>>>>> directory that contains two Rd files.     
>>>>> 
>>>>> Transcript of first example:
>>>>> eglist <- list( scrunge=tools:::prepare_Rd(  'd:/temp/scrunge.Rd',
>>>>>     defines=.Platform$OS.type, stages='install',
>>>>> warningCalls=FALSE)) tools:::makeLazyLoadDB( eglist, 'd:/temp/ll',
>>>>> compress=FALSE) tools:::fetchRdDB( 'd:/temp/ll') # Error: cannot
>>>>> allocate vector of size 1.4 Gb
>>>>> 
>>>>> The second example triggers an error with
>>>>> 'tools:::.install_package_Rd_objects' itself. It doesn't happen
>>>>> the first time, but frequently does after the Rd files have been
>>>>> changed. Transcript:   
>>>>> 
>>>>> # Make sure d:/temp/help/ is empty, then
>>>>> test> tools:::.install_package_Rd_objects( 'd:/temp/fakepack',
>>>>> test> 'd:/temp') tools:::fetchRdDB( 'd:/temp/help/temp')
>>>>> # All good. Next, I *removed* one of the two Rd files in
>>>>> "d:/temp/fakepack/man", ... # ...deleted "d:/temp/help/temp*",
>>>>> and tried again 
>>>>> test> tools:::.install_package_Rd_objects( 'd:/temp/fakepack',
>>>>> test> 'd:/temp') tools:::fetchRdDB( 'd:/temp/help/temp')
>>>>> Warning: Reached total allocation of 1535Mb: see help(memory.size)
>>>>> Warning: Reached total allocation of 1535Mb: see help(memory.size)
>>>>> Warning: Reached total allocation of 1535Mb: see help(memory.size)
>>>>> Warning: Reached total allocation of 1535Mb: see help(memory.size)
>>>>> Error: cannot allocate vector of size 1.9 Gb # Or on other
>>>>> occasions I get
>>>>> Error: internal error -3 in R_decompress1
>>>>> 
>>>>> Mark Bravington
>>>>> CSIRO CMIS
>>>>> Hobart
>>>>> Australia
>>>>> 
>>>>> --please do not edit the information below--
>>>>> Version:
>>>>>  platform = i386-pc-mingw32
>>>>>  arch = i386
>>>>>  os = mingw32
>>>>>  system = i386, mingw32
>>>>>  status =
>>>>>  major = 2
>>>>>  minor = 10.0
>>>>>  year = 2009
>>>>>  month = 10
>>>>>  day = 26
>>>>>  svn rev = 50208
>>>>>  language = R
>>>>>  version.string = R version 2.10.0 (2009-10-26) Windows XP (build
>>>>> 2600) Service Pack 2 
>>>>> Locale:
>>>>> LC_COLLATE=English_Australia.1252;LC_CTYPE=English_Australia.1252;L
>>>>> C_MONETARY=English_Australia.1252;LC_NUMERIC=C;LC_TIME=English_Aust
>>>>> ralia.1252
>>>>> Search Path:
>>>>>  .GlobalEnv, ROOT, package:grDevices, package:ad, package:chstuff,
>>>>> package:handy2, package:tweedie, package:statmod, package:handy,
>>>>> package:debug, package:mvbutils, mvb.session.info, package:tools,
>>>>> package:tcltk, package:stats, package:graphics, package:utils,
>>>>> package:methods, Autoloads, package:base
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel


More information about the R-devel mailing list