[Rd] Sweave driver extension

Tue Jan 31 14:18:10 CET 2012

Three thinngs -
   My original questions to R-help was "who do I talk to".  That was
answered by Brian R, and the discussion of how to change Sweave moved
offline.  FYI, I have a recode in hand that allows arbitrary reordering
of chunks; but changes to code used by hundreds need to be approached
cautiously.  Like the witch says in Wizard of Oz: "... But that's not
what's worrying me, it's how to do it.  These things must be done
delicately, or you hurt the spell."

   A few emails have made me aware of others who use noweb.  Most of
them, as I have, use the original Unix utility.  But since survival is
so interwoven with R I am trying to impliment that functionality
entirely in R to make the code self contained.  Just working out how to
best do so.

  Yihui: with respect to the note below, I don't see why you want to add
new syntax.  Why add "run_chunk(a)" when it is a synonym for <<a>>?    

Terry T.

On Mon, 2012-01-30 at 20:41 -0600, Yihui Xie wrote:
> OK, I did not realize the overhead problem is so overwhelming in your
> situation. Therefore I re-implemented the chunk reference in the knitr
> package in another way. In Sweave we use
> 
> <<a>>=
> # code in chunk a
> @
> 
> <<b>>=
> # use code in a
> <<a>>
> @
> 
> And in knitr, we can use real R code:
> 
> <<a>>=
> # code in chunk a
> @
> 
> <<b>>=
> # use code in a
> run_chunk('a')
> @
> 
> This also allows arbitrary levels of recursion, e.g. I add another
> chunk called 'c':
> 
> <<c>=
> run_chunk('b')
> @
> 
> Because b uses a, so when c calls b, it will consequently call a as well.
> 
> The function run_chunk() will not bring overhead problems, because it
> simply extracts the code from other chunks and evaluates it here. It
> is not a functional call. This feature is still in the development
> version (well, I did it this afternoon):
> https://github.com/yihui/knitr.
> 
> --------------
> 
> Talking about Knuth's original idea, I do not know as much as you, but
> under knitr's design, you can arrange code freely, since the code is
> stored in a named list after the input document is parsed. You can
> define code before using it, or use it before defining it (later); it
> is indexed by the chunk label. Top-down or bottom-up, in whatever
> order you want. And you are right; it requires a major rewrite, and
> that is exactly what I tried to do. I appreciate your feedback because
> I know you have very rich experience in reproducible research.
> 
> Regards,
> Yihui
> --
> Yihui Xie <xieyihui at gmail.com>
> Phone: 515-294-2465 Web: http://yihui.name
> Department of Statistics, Iowa State University
> 2215 Snedecor Hall, Ames, IA
> 
> 
> 
> On Mon, Jan 30, 2012 at 12:07 PM, Kevin R. Coombes
> <kevin.r.coombes at gmail.com> wrote:
> > I prefer the code chunks myself.
> >
> > Function calls have overhead. In a bioinformatics world with large datasets
> > and an R default that uses call-by-value rather than call-by-reference, the
> > function calls may have a _lot_ of overhead.  Writing the functions to make
> > sure they use call-by-reference for the large objects instead has a
> > different kind of overhead in the stress it puts on the writers and
> > maintainers of code.
> >
> > But then, I'm old enough to have looked at some of Knuth's source code for
> > TeX and read his book on Literate Programming, where the ideas of "weave"
> > and "tangle" were created for exactly the kind of application that Terry
> > asked about.  Knuth's fundamental idea here is that the documentation
> > (mainly the stuff processed through "weave") is created for humans, while
> > the executable code (in Knuth's view, the stuff created by "tangle") is
> > intended for computers.  If you want people to understand the code, then you
> > often want to use a top-down approach that outlines the structure -- code
> > chunks with forward references work perfectly for this purpose.
> >
> > One of the difficulties in mapping Knuth's idea over to R and Sweave is that
> > the operations of weave and tangle have gotten, well, tangled.  Sweave does
> > not just prepare the documentation; it also executes the code in order to
> > put the results of the computation into the documentation.  In order to get
> > the forward references to work with Sweave, you would have to makes two
> > passes through the file: one to make sure you know where each named chunk is
> > and build a cross-reference table, and one to actually execute the code in
> > the correct order.  That would presumably also require a major rewrite of
> > Sweave.
> >
> > The solution I use is to cheat and hide the chunks initially and reveal them
> > later to get the output that want. This comes down to combining eval, echo,
> > keep.source, and expand in the right combinations. Something like:
> >
> > %%%%%%%%
> > % set up a prologue that contains the code chunks. Do not evaluate or
> > display them.
> > <<coxme-check-arguments,echo=FALSE,eval=FALSE>>=
> > # do something sensible. If multiple steps, define them above here
> > # using the same idea.
> > @
> > % also define the other code chunks here
> >
> > \section{Start the First Section}
> >
> > The \texttt{coxme} function is defined as follows:
> > <<coxme,keep.source=TRUE,expand=FALSE>>=
> >
> > coxme <- function(formula, data, subset, blah blah  ){
> > <<coxme-check-arguments>>
> > <<coxme-build>>
> > <<coxme-compute>>
> > <<coxme-finish>>
> > }
> > @
> >
> > Argument checking is important:
> > <<name-does-not-matter-since-not-reused,eval=FALSE,expand=TRUE>>=
> > <<coxme-check-arguments>>=
> > @
> > % Describe the other chunks here
> >
> > %%%%%%%%
> >
> >
> >    Kevin
> >