[Rd] R vs. C

Claudia Beleites cbeleites at units.it
Tue Jan 18 12:36:48 CET 2011


On 01/18/2011 10:53 AM, Patrick Burns wrote:
> I'm not at all a fan of thinking
> of the examples as being tests.
>
> Examples should clarify the thinking
> of potential users. Tests should
> clarify the space in which the code
> is correct. These two goals are
> generally at odds.

Patrick, I completely agree with you that
- Tests should not clutter the documentation and go to their proper place.
- Examples are there for the user's benefit - and must be written accordingly.
- Often, test should cover far more situations than good examples.

Yet it seems to me that (part of the) examples are justly considered a (small) 
subset of the tests:
As a potential user, I reqest two things from good examples that have an 
implicit testing message/side effect:
- I like the examples to roughly outline the space in which the code works: they 
should tell me what I'm supposed to do.
- Depending on the function's purpose, I like to see a demonstration of the 
correctness for some example calculation.
(I don't want to see all further tests - I can look them up if I feel the need)

The fact that the very same line of example code serves a testing (side) purpose 
  doesn't mean that it should be copied into the tests, does it?

Thus, I think of the "public" part (the "preface") of the tests living in the 
examples.

My 2 ct,
Best regards,

Claudia



>
> On 17/01/2011 22:15, Spencer Graves wrote:
>> Hi, Paul:
>>
>>
>> The "Writing R Extensions" manual says that *.R code in a "tests"
>> directory is run during "R CMD check". I suspect that many R programmers
>> do this routinely. I probably should do that also. However, for me, it's
>> simpler to have everything in the "examples" section of *.Rd files. I
>> think the examples with independently developed answers provides useful
>> documentation.
>>
>>
>> Spencer
>>
>>
>> On 1/17/2011 1:52 PM, Paul Gilbert wrote:
>>> Spencer
>>>
>>> Would it not be easier to include this kind of test in a small file in
>>> the tests/ directory?
>>>
>>> Paul
>>>
>>> -----Original Message-----
>>> From: r-devel-bounces at r-project.org
>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Spencer Graves
>>> Sent: January 17, 2011 3:58 PM
>>> To: Dominick Samperi
>>> Cc: Patrick Leyshock; r-devel at r-project.org; Dirk Eddelbuettel
>>> Subject: Re: [Rd] R vs. C
>>>
>>>
>>> For me, a major strength of R is the package development
>>> process. I've found this so valuable that I created a Wikipedia entry
>>> by that name and made additions to a Wikipedia entry on "software
>>> repository", noting that this process encourages good software
>>> development practices that I have not seen standardized for other
>>> languages. I encourage people to review this material and make
>>> additions or corrections as they like (or sent me suggestions for me to
>>> make appropriate changes).
>>>
>>>
>>> While R has other capabilities for unit and regression testing, I
>>> often include unit tests in the "examples" section of documentation
>>> files. To keep from cluttering the examples with unnecessary material,
>>> I often include something like the following:
>>>
>>>
>>> A1<- myfunc() # to test myfunc
>>>
>>> A0<- ("manual generation of the correct answer for A1")
>>>
>>> \dontshow{stopifnot(} # so the user doesn't see "stopifnot("
>>> all.equal(A1, A0) # compare myfunc output with the correct answer
>>> \dontshow{)} # close paren on "stopifnot(".
>>>
>>>
>>> This may not be as good in some ways as a full suite of unit
>>> tests, which could be provided separately. However, this has the
>>> distinct advantage of including unit tests with the documentation in a
>>> way that should help users understand "myfunc". (Unit tests too
>>> detailed to show users could be completely enclosed in "\dontshow".
>>>
>>>
>>> Spencer
>>>
>>>
>>> On 1/17/2011 11:38 AM, Dominick Samperi wrote:
>>>> On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves<
>>>> spencer.graves at structuremonitoring.com> wrote:
>>>>
>>>>> Another point I have not yet seen mentioned: If your code is
>>>>> painfully slow, that can often be fixed without leaving R by
>>>>> experimenting
>>>>> with different ways of doing the same thing -- often after using
>>>>> profiling
>>>>> your code to find the slowest part as described in chapter 3 of
>>>>> "Writing R
>>>>> Extensions".
>>>>>
>>>>>
>>>>> If I'm given code already written in C (or some other language),
>>>>> unless it's really simple, I may link to it rather than recode it in R.
>>>>> However, the problems with portability, maintainability,
>>>>> transparency to
>>>>> others who may not be very facile with C, etc., all suggest that
>>>>> it's well
>>>>> worth some effort experimenting with alternate ways of doing the
>>>>> same thing
>>>>> in R before jumping to C or something else.
>>>>>
>>>>> Hope this helps.
>>>>> Spencer
>>>>>
>>>>>
>>>>>
>>>>> On 1/17/2011 10:57 AM, David Henderson wrote:
>>>>>
>>>>>> I think we're also forgetting something, namely testing. If you write
>>>>>> your
>>>>>> routine in C, you have placed additional burden upon yourself to
>>>>>> test your
>>>>>> C
>>>>>> code through unit tests, etc. If you write your code in R, you
>>>>>> still need
>>>>>> the
>>>>>> unit tests, but you can rely on the well tested nature of R to
>>>>>> allow you
>>>>>> to
>>>>>> reduce the number of tests of your algorithm. I routinely tell
>>>>>> people at
>>>>>> Sage
>>>>>> Bionetworks where I am working now that your new C code needs to
>>>>>> experience at
>>>>>> least one order of magnitude increase in performance to warrant the
>>>>>> effort
>>>>>> of
>>>>>> moving from R to C.
>>>>>>
>>>>>> But, then again, I am working with scientists who are not
>>>>>> primarily, or
>>>>>> even
>>>>>> secondarily, coders...
>>>>>>
>>>>>> Dave H
>>>>>>
>>>>>>
>>>> This makes sense, but I have seem some very transparent algorithms
>>>> turned
>>>> into vectorized R code
>>>> that is difficult to read (and thus to maintain or to change). These
>>>> chunks
>>>> of optimized R code are like
>>>> embedded assembly, in the sense that nobody is likely to want to mess
>>>> with
>>>> it. This could be addressed
>>>> by including pseudo code for the original (more transparent)
>>>> algorithm as a
>>>> comment, but I have never
>>>> seen this done in practice (perhaps it could be enforced by R CMD
>>>> check?!).
>>>>
>>>> On the other hand, in principle a well-documented piece of C/C++ code
>>>> could
>>>> be much easier to understand,
>>>> without paying a performance penalty...but "coders" are not likely to
>>>> place
>>>> this high on their
>>>> list of priorities.
>>>>
>>>> The bottom like is that R is an adaptor ("glue") language like Lisp that
>>>> makes it easy to mix and
>>>> match functions (using classes and generic functions), many of which are
>>>> written in C (or C++
>>>> or Fortran) for performance reasons. Like any object-based system
>>>> there can
>>>> be a lot of
>>>> object copying, and like any functional programming system, there can
>>>> be a
>>>> lot of function
>>>> calls, resulting in poor performance for some applications.
>>>>
>>>> If you can vectorize your R code then you have effectively found a
>>>> way to
>>>> benefit from
>>>> somebody else's C code, thus saving yourself some time. For
>>>> operations other
>>>> than pure
>>>> vector calculations you will have to do the C/C++ programming
>>>> yourself (or
>>>> call a library
>>>> that somebody else has written).
>>>>
>>>> Dominick
>>>>
>>>>
>>>>
>>>>>> ----- Original Message ----
>>>>>> From: Dirk Eddelbuettel<edd at debian.org>
>>>>>> To: Patrick Leyshock<ngkbr8es at gmail.com>
>>>>>> Cc: r-devel at r-project.org
>>>>>> Sent: Mon, January 17, 2011 10:13:36 AM
>>>>>> Subject: Re: [Rd] R vs. C
>>>>>>
>>>>>>
>>>>>> On 17 January 2011 at 09:13, Patrick Leyshock wrote:
>>>>>> | A question, please about development of R packages:
>>>>>> |
>>>>>> | Are there any guidelines or best practices for deciding when and
>>>>>> why to
>>>>>> | implement an operation in R, vs. implementing it in C? The
>>>>>> "Writing R
>>>>>> | Extensions" recommends "working in interpreted R code . . . this is
>>>>>> normally
>>>>>> | the best option." But we do write C-functions and access them in R -
>>>>>> the
>>>>>> | question is, when/why is this justified, and when/why is it NOT
>>>>>> justified?
>>>>>> |
>>>>>> | While I have identified helpful documents on R coding standards,
>>>>>> I have
>>>>>> not
>>>>>> | seen notes/discussions on when/why to implement in R, vs. when to
>>>>>> implement
>>>>>> | in C.
>>>>>>
>>>>>> The (still fairly recent) book 'Software for Data Analysis:
>>>>>> Programming
>>>>>> with
>>>>>> R' by John Chambers (Springer, 2008) has a lot to say about this. John
>>>>>> also
>>>>>> gave a talk in November which stressed 'multilanguage' approaches; see
>>>>>> e.g.
>>>>>>
>>>>>> http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> In short, it all depends, and it is unlikely that you will get a
>>>>>> coherent
>>>>>> answer that is valid for all circumstances. We all love R for how
>>>>>> expressive
>>>>>> and powerful it is, yet there are times when something else is
>>>>>> called for.
>>>>>> Exactly when that time is depends on a great many things and you
>>>>>> have not
>>>>>> mentioned a single metric in your question. So I'd start with John's
>>>>>> book.
>>>>>>
>>>>>> Hope this helps, Dirk
>>>>>>
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> ====================================================================================
>>>
>>>
>>>
>>> La version française suit le texte anglais.
>>>
>>> ------------------------------------------------------------------------------------
>>>
>>>
>>>
>>> This email may contain privileged and/or confidential ...{{dropped:25}}
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbeleites at units.it



More information about the R-devel mailing list