[Rd] R vs. C

Spencer Graves spencer.graves at structuremonitoring.com
Mon Jan 17 21:57:54 CET 2011


       For me, a major strength of R is the package development 
process.  I've found this so valuable that I created a Wikipedia entry 
by that name and made additions to a Wikipedia entry on "software 
repository", noting that this process encourages good software 
development practices that I have not seen standardized for other 
languages.  I encourage people to review this material and make 
additions or corrections as they like (or sent me suggestions for me to 
make appropriate changes).


       While R has other capabilities for unit and regression testing, I 
often include unit tests in the "examples" section of documentation 
files.  To keep from cluttering the examples with unnecessary material, 
I often include something like the following:


A1 <- myfunc() # to test myfunc

A0 <- ("manual generation of the correct  answer for A1")

\dontshow{stopifnot(} # so the user doesn't see "stopifnot("
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on "stopifnot(".


       This may not be as good in some ways as a full suite of unit 
tests, which could be provided separately.  However, this has the 
distinct advantage of including unit tests with the documentation in a 
way that should help users understand "myfunc".  (Unit tests too 
detailed to show users could be completely enclosed in "\dontshow".


       Spencer


On 1/17/2011 11:38 AM, Dominick Samperi wrote:
> On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves<
> spencer.graves at structuremonitoring.com>  wrote:
>
>>       Another point I have not yet seen mentioned:  If your code is
>> painfully slow, that can often be fixed without leaving R by experimenting
>> with different ways of doing the same thing -- often after using profiling
>> your code to find the slowest part as described in chapter 3 of "Writing R
>> Extensions".
>>
>>
>>       If I'm given code already written in C (or some other language),
>> unless it's really simple, I may link to it rather than recode it in R.
>>   However, the problems with portability, maintainability, transparency to
>> others who may not be very facile with C, etc., all suggest that it's well
>> worth some effort experimenting with alternate ways of doing the same thing
>> in R before jumping to C or something else.
>>
>>       Hope this helps.
>>       Spencer
>>
>>
>>
>> On 1/17/2011 10:57 AM, David Henderson wrote:
>>
>>> I think we're also forgetting something, namely testing.  If you write
>>> your
>>> routine in C, you have placed additional burden upon yourself to test your
>>> C
>>> code through unit tests, etc.  If you write your code in R, you still need
>>> the
>>> unit tests, but you can rely on the well tested nature of R to allow you
>>> to
>>> reduce the number of tests of your algorithm.  I routinely tell people at
>>> Sage
>>> Bionetworks where I am working now that your new C code needs to
>>> experience at
>>> least one order of magnitude increase in performance to warrant the effort
>>> of
>>> moving from R to C.
>>>
>>> But, then again, I am working with scientists who are not primarily, or
>>> even
>>> secondarily, coders...
>>>
>>> Dave H
>>>
>>>
> This makes sense, but I have seem some very transparent algorithms turned
> into vectorized R code
> that is difficult to read (and thus to maintain or to change). These chunks
> of optimized R code are like
> embedded assembly, in the sense that nobody is likely to want to mess with
> it. This could be addressed
> by including pseudo code for the original (more transparent) algorithm as a
> comment, but I have never
> seen this done in practice (perhaps it could be enforced by R CMD check?!).
>
> On the other hand, in principle a well-documented piece of C/C++ code could
> be much easier to understand,
> without paying a performance penalty...but "coders" are not likely to place
> this high on their
> list of priorities.
>
> The bottom like is that R is an adaptor ("glue") language like Lisp that
> makes it easy to mix and
> match functions (using classes and generic functions), many of which are
> written in C (or C++
> or Fortran) for performance reasons. Like any object-based system there can
> be a lot of
> object copying, and like any functional programming system, there can be a
> lot of function
> calls, resulting in poor performance for some applications.
>
> If you can vectorize your R code then you have effectively found a way to
> benefit from
> somebody else's C code, thus saving yourself some time. For operations other
> than pure
> vector calculations you will have to do the C/C++ programming yourself (or
> call a library
> that somebody else has written).
>
> Dominick
>
>
>
>>> ----- Original Message ----
>>> From: Dirk Eddelbuettel<edd at debian.org>
>>> To: Patrick Leyshock<ngkbr8es at gmail.com>
>>> Cc: r-devel at r-project.org
>>> Sent: Mon, January 17, 2011 10:13:36 AM
>>> Subject: Re: [Rd] R vs. C
>>>
>>>
>>> On 17 January 2011 at 09:13, Patrick Leyshock wrote:
>>> | A question, please about development of R packages:
>>> |
>>> | Are there any guidelines or best practices for deciding when and why to
>>> | implement an operation in R, vs. implementing it in C?  The "Writing R
>>> | Extensions" recommends "working in interpreted R code . . . this is
>>> normally
>>> | the best option."  But we do write C-functions and access them in R -
>>> the
>>> | question is, when/why is this justified, and when/why is it NOT
>>> justified?
>>> |
>>> | While I have identified helpful documents on R coding standards, I have
>>> not
>>> | seen notes/discussions on when/why to implement in R, vs. when to
>>> implement
>>> | in C.
>>>
>>> The (still fairly recent) book 'Software for Data Analysis: Programming
>>> with
>>> R' by John Chambers (Springer, 2008) has a lot to say about this.  John
>>> also
>>> gave a talk in November which stressed 'multilanguage' approaches; see
>>> e.g.
>>>
>>> http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html
>>>
>>>
>>> In short, it all depends, and it is unlikely that you will get a coherent
>>> answer that is valid for all circumstances.  We all love R for how
>>> expressive
>>> and powerful it is, yet there are times when something else is called for.
>>> Exactly when that time is depends on a great many things and you have not
>>> mentioned a single metric in your question.  So I'd start with John's
>>> book.
>>>
>>> Hope this helps, Dirk
>>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list