[Rd] R CMD check for the R code from vignettes

Fri May 30 19:34:07 CEST 2014

Sorry, it should be Yihui and nothing else. /Henrik

On Fri, May 30, 2014 at 10:15 AM, Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:
> I think there are several aspects to Yihue's post and some simple
> workarounds/long solutions to the issues:
>
> 1. For the reasons argued, I would agree that 'R CMD check'
> incorrectly assumes that tangled code script should be able to run
> without errors.  Instead I think it should only check the syntax, i.e.
> that it can be parsed without errors.  If not, then Sweave may have to
> be redfined to clarify that \Sexpr{}/"inline" expressions must not
> have "side effects".
>
> 2. For other (=non-Sweave) vignette builder packages, you can already
> today define engines that do not tangle, think
> %\VignetteEngine{knitr::knitr_no_tangle}.
>
> 3. Extending on this, I'd like to propose %\VignetteTangle{no} (and/or
> false, FALSE, ...), which would tell the engine to not generate the
> "tangle" script file.  Then it is up to the vignette engine to
> acknowledge this or not, but at least we will have a standard across
> engines rather that each of us come up with their own markup for this.
>  You can also imagine that one support other types of settings, e.g.
> %\VignetteTangle{all} to include also \Sexpr{} in the tangled output.
>
> /Henrik
>
> On Fri, May 30, 2014 at 9:29 AM, Carl Boettiger <cboettig at gmail.com> wrote:
>> Hi Yihui,
>>
>> I agree with you (and your comments in [knitr issue 784]) that it seems
>> wrong for R CMD check to be using tangle (purl, etc) as a way to check R
>> code in a vignette, when the standard and expected way to check the
>> vignette is already to knit / Sweave the vignette.
>>
>> I also agree with the perspective that the tangle function no longer plays
>> the crucial role it did when we were using noweb and C programs that
>> couldn't be compiled without tangle.
>>
>> However, I would be hesitant to see tangle removed entirely, as it is
>> occasionally a convenient way to create an R script from a dynamic
>> document.  Pure R scripts are still much more widely recognized than
>> dynamic documents, and I sometimes will just tangle out the R code because
>> a collaborator would have no idea what to do with a .Rmd file (Though
>> RStudio is certainly improving this situation).  Tangle-like functions also
>> provides a nice compliment to the "stitch" and friends that make dynamic
>> documents from the ubiquitous R scripts.
>>
>> [knitr issue 784]: https://github.com/yihui/knitr/issues/784
>>
>>
>> - Carl
>>
>>
>>
>> On Fri, May 30, 2014 at 6:21 AM, Kevin Coombes <kevin.r.coombes at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Unless someone is planning to change Stangle to include inline expressions
>>> (which I am *not* advocating), I think that relying on side-effects within
>>> an \Sexpr construction is a bad idea. So, my own coding style is to
>>> restrict my use of \Sexpr to calls of the form
>>> \Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less
>>> believe that having R CMD check use Stangle and report an error is probably
>>> a good thing.
>>>
>>> There is a completely separate questions about the relationship between
>>> Sweave/Stangle or knit/purl and literate programming that is linked to your
>>> question about whether to use Stangle on vignettes. The underlying model(s)
>>> in R have drifted away from Knuth's original conception, for some good
>>> reasons.
>>>
>>> The original goal of literate programming was to be able to explain the
>>> algorithms and data structures in the code to humans.  For that purpose, it
>>> was important to have named code chunks that you could move around, which
>>> would allow you to describe the algorithm starting from a high level
>>> overview and then drilling down into the details. From this perspective,
>>> "tangle" was critical to being able to reconstruct a program that would
>>> compile and run correctly.
>>>
>>> The vast majority of applications of Sweave/Stangle or knit/purl in modern
>>> R have a completely different goal: to produce some sort of document that
>>> describes the results of an analysis to a non-programmer or
>>> non-statistician.  For this goal, "weave" is much more important than
>>> "tangle", because the most important aspect is the ability to integrate the
>>> results (figures, tables, etc) of running the code into the document that
>>> get passed off to the person for whom the analysis was prepared. As a
>>> result, the number of times in my daily work that I need to explicitly
>>> invoke Stangle (or purl) explicitly is many orders of magnitude smaller
>>> than  the number of times that I invoke Sweave (or knitr).
>>>
>>>   -- Kevin
>>>
>>>
>>>
>>> On 5/30/2014 1:04 AM, Yihui Xie wrote:
>>>
>>>> Hi,
>>>>
>>>> Recently I saw a couple of cases in which the package vignettes were
>>>> somewhat complicated so that Stangle() (or knitr::purl() or other
>>>> tangling functions) can fail to produce the exact R code that is
>>>> executed by the weaving function Sweave() (or knitr::knit(), ...). For
>>>> example, this is a valid document that can pass the weaving process
>>>> but cannot generate a valid R script to be source()d:
>>>>
>>>> \documentclass{article}
>>>> \begin{document}
>>>> Assign 1 to x: \Sexpr{x <- 1}
>>>> <<>>=
>>>> x + 1
>>>> @
>>>> \end{document}
>>>>
>>>> That is because the inline R code is not written to the R script
>>>> during the tangling process. When an R package vignette contains
>>>> inline R code expressions that have significant side effects, R CMD
>>>> check can fail because the tangled output is not correct. What I
>>>> showed here is only a trivial example, and I have seen two packages
>>>> that have more complicated scenarios than this. Anyway, the key thing
>>>> that I want to discuss here is, since the R code in the vignette has
>>>> been executed once during the weaving process, does it make much sense
>>>> to execute the code generated from the tangle function? In other
>>>> words, if the weaving process has succeeded, is it necessary to
>>>> source() the R script again?
>>>>
>>>> The two options here are:
>>>>
>>>> 1. Do not check the R code from vignettes;
>>>> 2. Or fix the tangle function so that it produces exactly what was
>>>> executed in the weaving process. If this is done, I'm back to my
>>>> previous question: does it make sense to run the code twice?
>>>>
>>>> To push this a little further, personally I do not quite appreciate
>>>> literate programming in R as two separate steps, namely weave and
>>>> tangle. In particular, I do not see the value of tangle, considering
>>>> Sweave() (or knitr::knit()) as the new "source()". Therefore
>>>> eventually I tend to just drop tangle, but perhaps I missed something
>>>> here, and I'd like to hear what other people think about it.
>>>>
>>>> Regards,
>>>> Yihui
>>>> --
>>>> Yihui Xie <xieyihui at gmail.com>
>>>> Web: http://yihui.name
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>>
>> --
>> Carl Boettiger
>> UC Santa Cruz
>> http://carlboettiger.info/
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel