[Rd] [R] cacheSweave / pgfSweave driver for package vignette

Claudia Beleites cbeleites at units.it
Fri Aug 13 12:29:28 CEST 2010


Dear all,

Maybe we should move the discussion to r-devel? So please excuse the 
cross-posting, it is to tell people at r-help where to find the rest of the 
discussion (in case you agree with me).


I've been wondering about that, too.

Gabor, I use "fake" vignettes along your lines, too.
In order to provide meaningful samples, I have both bulky data and bulky 
calculations (at least too long to have any fun in running R CMD check frequently).

As I do not want to burden my package with lots (> 60 MB) of raw data in various 
file formats, two vignettes do their real work extra (and the source is 
available for separate download).

So for the development work it would be good to have caching for speed-up.
For the testing purposes of R CMD CHECK, however, the whole thing needs to be 
calculated: afaik the caching mechanism checks for changes in the respective 
chunks. Which is great for data-analysis work. However, in a package development 
scenario the changes are rather expected in the package. I suspect that the 
caching cannot check this. Thus a cached vignette does greatly reduce the 
calculation time, but also knocks out part of the testing.
This would be without concern, if the package is well behaved and does its 
testing in the tests and has the vignettes as manuals. I have to admit, though, 
that my package is not (yet) at this point.

So I personally find myself with a shell script that automatically builds all 
vignettes first, transfers some files into the package (the data sets coming 
with the package are constructed in vignettes), and then check and build the 
package.
In the end, this dependency of the package on the results of its vignettes needs 
much more calculation. I'm talking of ca. 10 - 15 min for the whole process 
(i.e. 5 - 7 min for one check cycle). This is awkward for development, but I 
think it's OK for something to be done occasionally on a nightly check on the 
server.

My conclusion is, that a cached Sweave driver should only be specified in 
certain situations. I.e. it would be very helpful for developing to do this at 
"home", but I'm afraid it is not the best idea to reduce the work in checking 
the package in general (e.g. during nightly checks).
I also say this because I have been running into trouble with the nighly build 
on r-forge (due to some LaTeX packages that I thought to be fairly standard, 
which they weren't).
Another error I "like" to produce is to forget adding a new source file to the 
version control. Both cases are only found in checks during the nightly build on 
the server. There may be other mistakes that would be masked by the caching.

Of course, it is also not nice to keep the servers calculating examples for 
hours. I presume, however, that this case is quite rare (compared to situations 
where the regular building and checking is too long for a fluent development 
cycle), and I'd say that in this case Gabor's procedure is OK.

For my work it would be much more helpful, if R CMD CHECK had also "positive" 
flags (e.g. "--tests" as abbreviation for "--no-codoc --no-examples --no-install 
--no-vignettes --no-latex")

I know hardly anything about make files and never wrote one myself. I think they 
could be helpful here to switch between the "development checks" and a complete 
build & check. So I'd be very curious to see some make files.

HTH,

Claudia




-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbeleites at units.it



More information about the R-devel mailing list