[R] R-code in html help pages: syntax highlighting

Tue Mar 17 07:47:58 CET 2009

Duncan Murdoch wrote:
> On 16/03/2009 5:06 PM, Romain Francois wrote:
>> hadley wickham wrote:
>>>> It would be pretty easy to use the output from the R parser (which 
>>>> is never
>>>> wrong, is it?), and dump some markup out of it. For example the 
>>>> showTree
>>>> function in codetools dumps an R expression as Lisp, this is not 
>>>> too far
>>>> from generating html, or any other markup.
>>>>
>>>> As this sounds like fun, I'll volunteer to do something about this. 
>>>> Another
>>>> advantage is that we can imagine to plug hyperlinks in  R code that 
>>>> lives in
>>>> html help pages.
>>>>     
>>> This also sounds like a good idea for a google summer of code project
>>> - that way you might be able to get a student to give you a hand as
>>> well.
>>>
>>> Hadley
>>>   
>> That did cross my mind earlier this evening, it just seems a bit too 
>> easy to last all summer, but maybe I am missing something difficult. 
>> I will start to play with this over the next few days, and make up my 
>> mind.
>
> It depends on your standards.  You said you want R to parse the code 
> in the Rd file.  That's going to be hard, because Rd files contain 
> something that is only "R-like", as far as the parser is concerned. 
> You'll need to convert it into R code before you can pass it to the R 
> parser.

I would assume this would be outsourced to the experimental parse_Rd 
function

> And then there's the question of scoping, which gets into the 
> evaluator, not just the parser.  (The parser only recognizes "mean" as 
> an identifier; it's the evaluator that decides whether it's the 
> function in the base package or a local variable.)

That is an issue. I guess I will fall back on what the parser says and 
infer on the scoping. Within the lines below, mean would be different 
each time

mean( 1:10 )
lapply( 1:10, mean)
mean <- (1+4) / 2
lapply( list( mean, median), function( f ) f( 1:10) )
{ mean <- median; mean( 1:10 ) }

> So if you've got high standards, it's probably quite hard.  On the 
> other hand, if you're willing to accept the usual sort of errors that 
> syntax highlighters make, it's not so bad, but not trivial.

There is probably some middle ground between the job an highlighter 
would do, and the way the R evaluator would think the expression 
eventually. Given that this is more a nice to have feature, I guess we 
can accept some errors. checkUsage is wrong sometimes, but it is still a 
good tool.

>>
>> One of the problem I might run into is performance, if we want this 
>> to treat all Rd files, we are going to want something very efficient, 
>> and it might not be enough to build on top of codetools (which uses 
>> recursion at the R level) , but could make sense to provide a C level 
>> implementation.
>
> Remember what Knuth said about premature optimization.  Write it first 
> in R, and only optimize it if it's not fast enough.  

Deal

> (I'd guess it'll be fast enough: Brian Ripley reported that all the R 
> code he wrote for conversions in R-devel was faster than the Perl code 
> it was replacing.)

That is good news

>
>> This could lead to interesting things as:
>> - syntax highlighting in sweave (or decumar)
>> - pretty printing in the console (using ansi characters)
>> - syntax highlighting in R help files, potentially with hyperlinks
>>
>> I have requested creation of a project on r-forge. Anyone else want 
>> to play with this ?
>
> I'll sign up once it's going.
>
> Duncan Murdoch
>
>
>
>
>

-- 
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr