[Rd] Creating a vignette which depends on a non-distributable file

Martin Morgan mtmorgan at fredhutch.org
Fri May 15 01:50:45 CEST 2015

On 05/14/2015 04:33 PM, Henrik Bengtsson wrote:
> On May 14, 2015 15:04, "January Weiner" <january.weiner at gmail.com> wrote:
>> Dear all,
>> I am writing a vignette that requires a file which I am not allowed to
>> distribute, but which the user can easily download manually. Moreover, it
>> is not possible to download this file automatically from R: downloading
>> requires a (free) registration that seems to work only through a browser.
>> (I'm talking here about the MSigDB from the Broad Institute,
>> http://www.broadinstitute.org/gsea/msigdb/index.jsp).
>> In the vignette, I tell the user to download the file and then show how it
>> can be parsed and used in R. Thus, I can compile the vignette only if this
>> file is present in the vignettes/ directory of the package. However, it
>> would then get included in the package -- which I am not allowed to do.
>> What should I do?
>> (1) finding an alternative to MSigDB is not a solution -- there simply is
>> no alternative.
>> (2) I could enter the code (and the results) in a verbatim environment
>> instead of using Sweave. This has obvious drawbacks (for one thing, it
>> would look incosistent).

use the chunk argument eval=FALSE instead of placing the code in a verbatim 
argument. See ?RweaveLatex if you're compiling a PDF vignette from Rnw or the 
knitr documentation for (much nicer for users of your vignette, in my opinion) 
Rmd vignettes processed to HTML.

A common pattern is to process chunks 1, 2, 3, 4, and then there is a 'leap of 
faith' in chunk 5 (with eval=FALSE) and a second chunk (maybe with echo=FALSE, 
eval=TRUE) that reads the _result_ that would have been produced by chunk 5 from 
a serialized instance into the R session for processing in chunks 6, 7, 8...

Also very often while it might make sense to analyse an entire data set as part 
of a typical work flow, for illustrative purposes a much smaller subset or 
simulated data might be relevant; again a strategy would be to illustrate the 
problematic steps with simulated data, and then resume the narrative with the 
analyzed full data.

A secondary consideration may be that if your package _requires_ MSigDB to 
function, then it can't be automatically tested by repository build machines -- 
you'll want to have unit tests or other approaches to ensure that 'bit rot' does 
not set in without you being aware of it.

If this is a Bioconductor package, then it's appropriate to ask on the 
Bioconductor devel mailing list.


http://bioconductor.org/packages/BiocStyle/ might be your friend for producing 
stylish vignettes.


>> (3) I could build vignette outside of the package and put it into the
>> inst/doc directory. This also has obvious drawbacks.
>> (4) Leaving this example out defies the purpose of my package.
>> I am tending towards solution (2). What do you think?
> Not clear how big of a static piece you're taking about, but maybe you
> could set it up such that you use (2) as a fallback, i.e. have the vignette
> include a static/pre-generated piece (which is clearly marked as such) only
> if the external dependency is not available.
> Just a thought
> Henrik
>> Kind regards,
>> j.
>> --
>> -------- January Weiner --------------------------------------
>>          [[alternative HTML version deleted]]
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

More information about the R-devel mailing list