[Rd] Problems with checking documentation vs data, and a proposal

Ross Boylan ross at biostat.ucsf.edu
Tue Jan 16 23:03:00 CET 2007


I have a single data file inputs.RData that contains 3 objects.  I
generated an Rd page for each object using prompt().
When I run R CMD check I get
* checking for code/documentation mismatches ... WARNING
Warning in utils::data(list = al, envir = data_env) : 
	 data set 'gold' not found
(gold is one of the objects).

This appears to be coming from the codocData function defined in
src/library/tools/R/QC.R (this is in the Debianised source 2.4.1, so the
path might be a little different).

According to the help on this function, it will only attempt a match
when there is a single alias in the documentation file, although I'm not
sure that's what the code does (it seems to check only if there is more
than one format section).  At any rate, the central logic appears to
gather up names of data objects and then to load them with
            ## Try loading the data set into data_env.
            utils::data(list = al, envir = data_env)
            if(exists(al, envir = data_env, mode = "list",
                      inherits = FALSE)) {
                al <- get(al, envir = data_env, mode = "list")
            }
Since there is no gold.RData, this is failing.

This leads to 2 issues: what should I do now, and how might this work
better in the future.

Taking the future first, how about having the code first load all the
data files that it finds somewhere near the beginning?  If it did so,
the code
        ## Try finding the variable or data set given by the alias.
        al <- aliases[i]
        if(exists(al, envir = code_env, mode = "list",
                  inherits = FALSE)) {
which precedes the earlier snippet, would find the symbol was defined
and be happy.  I suppose the data could be loaded into code_env,
although using it seems to risk deciding that a data symbol is defined
when the symbol refers to a code object.

I'm not sure if attempting to load the data objects individually should
still be attempted under this scenario, if the symbol is not already
present.

What can I do in the short run, particularly since I would like to have
the code pass R CMD check with versions of R that don't include this
possible enhancement, what can I do?  I see several options, none of
them beautiful:
1) Delete inputs.RData and create 3 separate data objects.  However, I
have code that relies on inputs being present, and the 3 data items go
together naturally.
2) Make a single document describing inputs.RData.  First problem: the
page would be awkward combining all 3 things.  Second, it looks as if
codocData might still try loading the individual data objects, since it
tries to pull data names out of the documentation, even out of
individual item inside \describe.
3) Attempt to disable the checks by adding multiple aliases or something
else to be revealed by closer inspection of the code.  This is a hack
that bypasses the checking altogether (unless it turns out I still get a
complaint about missing documentation).
4) Create gold.RData and others as symlinks to inputs.RData.  Fragile
across operating systems, version control systems, and versions of tar.
Might get errors about multiple data definitions.

Usual caveats: this is all based on my imperfect understanding of the
code.

So, any comments on the possible modification to codocData or the
work-arounds?
-- 
Ross Boylan                                      wk:  (415) 514-8146
185 Berry St #5700                               ross at biostat.ucsf.edu
Dept of Epidemiology and Biostatistics           fax: (415) 514-8150
University of California, San Francisco
San Francisco, CA 94107-1739                     hm:  (415) 550-1062



More information about the R-devel mailing list