[Rd] Sweave driver extension

Yihui Xie xie at yihui.name
Tue Jan 31 03:41:50 CET 2012


OK, I did not realize the overhead problem is so overwhelming in your
situation. Therefore I re-implemented the chunk reference in the knitr
package in another way. In Sweave we use

<<a>>=
# code in chunk a
@

<<b>>=
# use code in a
<<a>>
@

And in knitr, we can use real R code:

<<a>>=
# code in chunk a
@

<<b>>=
# use code in a
run_chunk('a')
@

This also allows arbitrary levels of recursion, e.g. I add another
chunk called 'c':

<<c>=
run_chunk('b')
@

Because b uses a, so when c calls b, it will consequently call a as well.

The function run_chunk() will not bring overhead problems, because it
simply extracts the code from other chunks and evaluates it here. It
is not a functional call. This feature is still in the development
version (well, I did it this afternoon):
https://github.com/yihui/knitr.

--------------

Talking about Knuth's original idea, I do not know as much as you, but
under knitr's design, you can arrange code freely, since the code is
stored in a named list after the input document is parsed. You can
define code before using it, or use it before defining it (later); it
is indexed by the chunk label. Top-down or bottom-up, in whatever
order you want. And you are right; it requires a major rewrite, and
that is exactly what I tried to do. I appreciate your feedback because
I know you have very rich experience in reproducible research.

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA



On Mon, Jan 30, 2012 at 12:07 PM, Kevin R. Coombes
<kevin.r.coombes at gmail.com> wrote:
> I prefer the code chunks myself.
>
> Function calls have overhead. In a bioinformatics world with large datasets
> and an R default that uses call-by-value rather than call-by-reference, the
> function calls may have a _lot_ of overhead.  Writing the functions to make
> sure they use call-by-reference for the large objects instead has a
> different kind of overhead in the stress it puts on the writers and
> maintainers of code.
>
> But then, I'm old enough to have looked at some of Knuth's source code for
> TeX and read his book on Literate Programming, where the ideas of "weave"
> and "tangle" were created for exactly the kind of application that Terry
> asked about.  Knuth's fundamental idea here is that the documentation
> (mainly the stuff processed through "weave") is created for humans, while
> the executable code (in Knuth's view, the stuff created by "tangle") is
> intended for computers.  If you want people to understand the code, then you
> often want to use a top-down approach that outlines the structure -- code
> chunks with forward references work perfectly for this purpose.
>
> One of the difficulties in mapping Knuth's idea over to R and Sweave is that
> the operations of weave and tangle have gotten, well, tangled.  Sweave does
> not just prepare the documentation; it also executes the code in order to
> put the results of the computation into the documentation.  In order to get
> the forward references to work with Sweave, you would have to makes two
> passes through the file: one to make sure you know where each named chunk is
> and build a cross-reference table, and one to actually execute the code in
> the correct order.  That would presumably also require a major rewrite of
> Sweave.
>
> The solution I use is to cheat and hide the chunks initially and reveal them
> later to get the output that want. This comes down to combining eval, echo,
> keep.source, and expand in the right combinations. Something like:
>
> %%%%%%%%
> % set up a prologue that contains the code chunks. Do not evaluate or
> display them.
> <<coxme-check-arguments,echo=FALSE,eval=FALSE>>=
> # do something sensible. If multiple steps, define them above here
> # using the same idea.
> @
> % also define the other code chunks here
>
> \section{Start the First Section}
>
> The \texttt{coxme} function is defined as follows:
> <<coxme,keep.source=TRUE,expand=FALSE>>=
>
> coxme <- function(formula, data, subset, blah blah  ){
> <<coxme-check-arguments>>
> <<coxme-build>>
> <<coxme-compute>>
> <<coxme-finish>>
> }
> @
>
> Argument checking is important:
> <<name-does-not-matter-since-not-reused,eval=FALSE,expand=TRUE>>=
> <<coxme-check-arguments>>=
> @
> % Describe the other chunks here
>
> %%%%%%%%
>
>
>    Kevin
>



More information about the R-devel mailing list