[Rd] CRAN policies

Sat Mar 31 19:56:04 CEST 2012

Hi, Ted:

       Thank you for the most eloquent and complete description of the 
problem and opportunity I've seen in a while.

       Might you have time to review the Wikipedia articles on "Package 
development process" and "Software repository" 
(http://en.wikipedia.org/wiki/Package_development_process; 
http://en.wikipedia.org/wiki/Software_repository) and share with me your 
reactions?

       I wrote the "Package development process" article and part of the 
"Software repository" article, because the R package development process 
is superior to similar processes I've seen for other languages.  
However, I'm not a leading researcher on these issues, and your comments 
suggest that you know far more than I about this.  Humanity might 
benefit from your review of these articles.  (If you have any changes 
you might like to see, please make them or ask me to make them.  
Contributing to Wikipedia can be a very high leverage activity, as 
witnessed by the fact that the Wikipedia article on SOPA received a 
million views between the US holidays of Thanksgiving and Christmas last 
year.)

       Thanks again,
       Spencer

On 3/31/2012 8:29 AM, Ted Byers wrote:
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
>> On Behalf Of Paul Gilbert
>> Sent: March-31-12 9:57 AM
>> To: Mark.Bravington at csiro.au
>> Cc: r-devel at stat.math.ethz.ch
>> Subject: Re: [Rd] CRAN policies
>>
> Greetings all
>
>> Mark
>>
>> I would like to clarify two specific points.
>>
>> On 12-03-31 04:41 AM, Mark.Bravington at csiro.au wrote:
>>   >  ...
>>> Someone has subsequently decided that code should look a certain way,
>>> and has added a check that isn't in the language itself-- but they
> haven't
>> thought of everything, and of course they never could.
>>
>> There is a large overlap between people writing the checks and people
> writing
>> the interpreter. Even though your code may have been working, if your
>> understanding of the language definition is not consistent with that of
> the
>> people writing the interpreter, there is no guarantee that it will
> continue to
>> work, and in some cases the way in which it fails could be that it
> produces
>> spurious results. I am inclined to think of code checks as an additional
> way to be
>> sure my understanding of the R language is close to that of the people
> writing
>> the interpreter.
>>
>>> It depends on how Notes are being interpreted, which from this thread is
> no
>> longer clear.
>>   >  The R-core line used to be "Notes are just notes" but now we seem to
> have
>> "significant Notes" and ...
>>
>> My understanding, and I think that of a few other people, was incorrect,
> in that
>> I thought some notes were intended always to remain as notes, and others
>> were more serious in that they would eventually become warnings or errors.
> I
>> think Uwe addressed this misunderstanding by saying that all notes are
>> intended to become warnings or errors. In several cases the reason they
> are
>> not yet warnings or errors is that the checks are not yet good enough,
> they
>> produce too many false positives.
>> So, this means that it is very important for us to look at the notes and
> to point
>> out the reasons for the false positives, otherwise they may become
> warnings or
>> errors without being recognised as such.
>>
> I left the above intact as it nicely illustrates what much of this
> discussion reminds me of.  Let me illustrate with the question of software
> development in one of my favourite languages: C++.
>
> The first issue to consider is, "What is the language definition and who
> decides?"  Believe it or not, there are two answers from two very different
> perspectives.  The first is favoured by language lawyers, who point to the
> ANSI standard, and who will argue incessantly about the finest of details.
> But to understand this, you have to understand what ANSI is: it is an
> industry organization and to construct the standard, they have industry
> representatives gathered, divided up into subcommittees each of which is
> charged with defining the language.  And of course everyone knows that,
> being human, they can get it wrong, and thus ANSI standards evolve ever so
> slowly through time.  To my mind, that is not much different from what
> R/core or Cran are involved in.  But the other answer comes from the
> perspective of a professional software developer, and that is, that the
> final arbiter of what the language is is your compiler.  If you want to get
> product out the door, it doesn't matter if the standard says 'X' if the
> compiler doesn't support it, or worse, implements it incorrectly.  Most
> compilers have warnings and errors, and I like the idea of extending that to
> have notes, but that is a matter of taste vs pragmatism.  I know many
> software developers that choose to ignore warnings and fix only the errors.
> Their rationale is that it takes time they don't have to fix the warnings
> too.  And I know others who treat all warnings as errors unless they have
> discovered that there is a compiler bug that generates spurious warnings of
> a particular kind (in which case that specific warning can usually be turned
> off).  Guess which group has lower bug rates on average.  I tend to fall in
> the latter group, having observed that with many of these things, you either
> fix them now or you will fix them, at greater cost, later.
>
> The second issue to consider is, "What constitutes good code, and what is
> necessary to produce it?"  That I won't answer beyond saying, 'whatever
> works.'  That is because it is ultimately defined by the end users'
> requirements.  that is why we have software engineers who specialize in
> requirements engineering.  these are bright people who translate the wish
> lists of non-technical users into functional and environmental requirements,
> that the rest of us can code to.  But before we begin coding, we have QA
> specialists that design a variety of tests from finely focussed unit tests
> through integration tests to broadly focussed usability tests, ending with a
> suite of tests that basically confirm that the requirements defined for the
> product are satisfied.  Standard practice in good software houses is that
> nothing gets added to the codebase unless the entire code base, with the new
> or revised code,  compiles and passes the entire test suite.  When new code
> stresses the codebase in such a way as to trigger a failure in the existing
> code, then when it is diagnosed and fixed, new tests are designed and added
> to the test suite codebase (which has the same requirement of everything
> building and passing all tests).  of course, some do this better than others
> as there are reasons NASA may spend $5 per line of code while many industry
> players spend $0.05 per line of code.
>
> It is sheer folly for anyone to suggest that reliance on warnings and
> errors, even extending this to notes, ensures good code.  At best, these are
> necessary to support development of good code, but they do not come close to
> being sufficient.  it is trivial to find examples of C code, for computing a
> mean, variance and standard deviation, that is correct both WRT the ANSI
> standard and the compiler, and yet it is really bad code (look for single
> pass algorithms, and you'll find one of the most commonly recommended
> algorithms is also one of the worst, In terms of accuracy under some inputs,
> and yet an infrequently recommended algorithm is one of the best both in
> terms of ease of implementation, speed and accuracy).  And you will still
> find good mathematicians defending the bad code by saying it is
> mathematically correct, but this is because they do not understand the
> consequences of finite precision arithmetic and rounding error.
>
> I would observe, as an outsider, that what CRAN is apparently doing is
> primarily focussed on the first issue above, but going beyond what the R
> interpreter does to get a better handle on a system of warnings with an
> extension to notes.  The notes question I can understand as a pragmatic
> matter.  If I were assigned to do the same sort of thing, I would probably
> do it in a similar manner, leaving some things as notes until both I and the
> community I serve develop a better understanding of the issues involved in
> the subject of the notes to the point of being better able to either have
> them evolve into more precisely defined warnings or die.  I understand that
> getting many of these things can get tedious and time consuming, but in fact
> there is no other way for a community to analyse the issues involved and
> develop a good understanding of how best to handle them.
>
> But since CRAN does not appear to require requirements engineering to be
> completed along with a comprehensive suite of QA tests, there is no possible
> way they can offer any guarantees or even recommendations that any package
> on CRAN is good quality.  From the current reaction to mere notes, I can
> imagine the reaction that would arise should they ever decide to do so.  It
> is very much up to the 'consumer' to search CRAN, and evaluate each
> interesting package to ensure it works as advertised, and I have no doubt
> that some are gems while others are best avoided.
>
> Just my $0.02 ...
>
> Cheers
>
> Ted
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com