[Rd] CRAN policies

Sat Mar 31 21:03:27 CEST 2012

> -----Original Message-----
> From: Spencer Graves [mailto:spencer.graves at prodsyse.com]
> Sent: March-31-12 1:56 PM
> To: Ted Byers
> Cc: 'Paul Gilbert'; Mark.Bravington at csiro.au; r-devel at stat.math.ethz.ch
> Subject: Re: [Rd] CRAN policies
> 
> Hi, Ted:
> 
> 
>        Thank you for the most eloquent and complete description of the
problem
> and opportunity I've seen in a while.
> 
To paraphrase and flagrantly plagiarize a better scholar than I, 'If I have
seen farther, it is because I stand on the shoulders of giants.'

No really, I have been doing this since the stone age, when we used rocks,
or marks cut into sticks, or knots tied in string made from hemp, as our
computing devices.  And the extent to which most of us could count was
'1,2,3, many'  ;-)

Might I suggest an additional essay for you about the place of documentation
in quality software production?  We all know the benefits of design
documentation, but documentation intended for users is, in my view,
critical.  In my view, though, I have a successful interface if users find
it so intuitive that they have no need for the wonderful documentation I
write.  I'll say no more but to give an example of the best documentation of
a software product I have seen in more than 30 years (no, I wrote neither it
nor the software it describes): http://eigen.tuxfamily.org/dox/index.html.
It is so nice to be able to commend someone who has done well!

Eigen is a C++ library supporting very efficient and fast matrix algebra,
and then some.

GSL is another very good example:
http://www.gnu.org/software/gsl/manual/html_node/ but not quite as good, in
my view, as Eigen

There is a SCM product, primarily Unix, though it does build under Cygwin,
called Aegis.  The last I looked, it had a nice explanation of the protocol
of testing, and ensuring that everything builds and passes all tests before
adding new or revised code to the codebase.  There may be support for it in
more recent products like GIT or Subversion, but to be honest I haven't had
the time to look.

To gather material for requirements gathering, and use of that to guide QA
processes and the design of one of the several suites of tests a project
usually needs, the place where the best info is in the many references
dealing with UML.

You have made a good start on those pages, but it needs to be fleshed out.
I do not recommend making either of them longer than 50% more than their
current length.  Rather, I suggest fleshing it out hypertext fashion, by
adding (links to) pages dealing with different issues in more detail than is
possible in an executive summary.

But, overall, well done.

Cheers

Ted

> 
>        Might you have time to review the Wikipedia articles on "Package
> development process" and "Software repository"
> (http://en.wikipedia.org/wiki/Package_development_process;
> http://en.wikipedia.org/wiki/Software_repository) and share with me your
> reactions?
> 
> 
>        I wrote the "Package development process" article and part of the
> "Software repository" article, because the R package development process
> is superior to similar processes I've seen for other languages.
> However, I'm not a leading researcher on these issues, and your comments
> suggest that you know far more than I about this.  Humanity might
> benefit from your review of these articles.  (If you have any changes
> you might like to see, please make them or ask me to make them.
> Contributing to Wikipedia can be a very high leverage activity, as
> witnessed by the fact that the Wikipedia article on SOPA received a
> million views between the US holidays of Thanksgiving and Christmas last
> year.)
> 
> 
>        Thanks again,
>        Spencer
> 
> 
> On 3/31/2012 8:29 AM, Ted Byers wrote:
> >> -----Original Message-----
> >> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-
> project.org]
> >> On Behalf Of Paul Gilbert
> >> Sent: March-31-12 9:57 AM
> >> To: Mark.Bravington at csiro.au
> >> Cc: r-devel at stat.math.ethz.ch
> >> Subject: Re: [Rd] CRAN policies
> >>
> > Greetings all
> >
> >> Mark
> >>
> >> I would like to clarify two specific points.
> >>
> >> On 12-03-31 04:41 AM, Mark.Bravington at csiro.au wrote:
> >>   >  ...
> >>> Someone has subsequently decided that code should look a certain way,
> >>> and has added a check that isn't in the language itself-- but they
> > haven't
> >> thought of everything, and of course they never could.
> >>
> >> There is a large overlap between people writing the checks and people
> > writing
> >> the interpreter. Even though your code may have been working, if your
> >> understanding of the language definition is not consistent with that of
> > the
> >> people writing the interpreter, there is no guarantee that it will
> > continue to
> >> work, and in some cases the way in which it fails could be that it
> > produces
> >> spurious results. I am inclined to think of code checks as an
additional
> > way to be
> >> sure my understanding of the R language is close to that of the people
> > writing
> >> the interpreter.
> >>
> >>> It depends on how Notes are being interpreted, which from this thread
is
> > no
> >> longer clear.
> >>   >  The R-core line used to be "Notes are just notes" but now we seem
to
> > have
> >> "significant Notes" and ...
> >>
> >> My understanding, and I think that of a few other people, was
incorrect,
> > in that
> >> I thought some notes were intended always to remain as notes, and
others
> >> were more serious in that they would eventually become warnings or
errors.
> > I
> >> think Uwe addressed this misunderstanding by saying that all notes are
> >> intended to become warnings or errors. In several cases the reason they
> > are
> >> not yet warnings or errors is that the checks are not yet good enough,
> > they
> >> produce too many false positives.
> >> So, this means that it is very important for us to look at the notes
and
> > to point
> >> out the reasons for the false positives, otherwise they may become
> > warnings or
> >> errors without being recognised as such.
> >>
> > I left the above intact as it nicely illustrates what much of this
> > discussion reminds me of.  Let me illustrate with the question of
software
> > development in one of my favourite languages: C++.
> >
> > The first issue to consider is, "What is the language definition and who
> > decides?"  Believe it or not, there are two answers from two very
different
> > perspectives.  The first is favoured by language lawyers, who point to
the
> > ANSI standard, and who will argue incessantly about the finest of
details.
> > But to understand this, you have to understand what ANSI is: it is an
> > industry organization and to construct the standard, they have industry
> > representatives gathered, divided up into subcommittees each of which is
> > charged with defining the language.  And of course everyone knows that,
> > being human, they can get it wrong, and thus ANSI standards evolve ever
so
> > slowly through time.  To my mind, that is not much different from what
> > R/core or Cran are involved in.  But the other answer comes from the
> > perspective of a professional software developer, and that is, that the
> > final arbiter of what the language is is your compiler.  If you want to
get
> > product out the door, it doesn't matter if the standard says 'X' if the
> > compiler doesn't support it, or worse, implements it incorrectly.  Most
> > compilers have warnings and errors, and I like the idea of extending
that to
> > have notes, but that is a matter of taste vs pragmatism.  I know many
> > software developers that choose to ignore warnings and fix only the
errors.
> > Their rationale is that it takes time they don't have to fix the
warnings
> > too.  And I know others who treat all warnings as errors unless they
have
> > discovered that there is a compiler bug that generates spurious warnings
of
> > a particular kind (in which case that specific warning can usually be
turned
> > off).  Guess which group has lower bug rates on average.  I tend to fall
in
> > the latter group, having observed that with many of these things, you
either
> > fix them now or you will fix them, at greater cost, later.
> >
> > The second issue to consider is, "What constitutes good code, and what
is
> > necessary to produce it?"  That I won't answer beyond saying, 'whatever
> > works.'  That is because it is ultimately defined by the end users'
> > requirements.  that is why we have software engineers who specialize in
> > requirements engineering.  these are bright people who translate the
wish
> > lists of non-technical users into functional and environmental
requirements,
> > that the rest of us can code to.  But before we begin coding, we have QA
> > specialists that design a variety of tests from finely focussed unit
tests
> > through integration tests to broadly focussed usability tests, ending
with a
> > suite of tests that basically confirm that the requirements defined for
the
> > product are satisfied.  Standard practice in good software houses is
that
> > nothing gets added to the codebase unless the entire code base, with the
new
> > or revised code,  compiles and passes the entire test suite.  When new
code
> > stresses the codebase in such a way as to trigger a failure in the
existing
> > code, then when it is diagnosed and fixed, new tests are designed and
added
> > to the test suite codebase (which has the same requirement of everything
> > building and passing all tests).  of course, some do this better than
others
> > as there are reasons NASA may spend $5 per line of code while many
industry
> > players spend $0.05 per line of code.
> >
> > It is sheer folly for anyone to suggest that reliance on warnings and
> > errors, even extending this to notes, ensures good code.  At best, these
are
> > necessary to support development of good code, but they do not come
close
> to
> > being sufficient.  it is trivial to find examples of C code, for
computing a
> > mean, variance and standard deviation, that is correct both WRT the ANSI
> > standard and the compiler, and yet it is really bad code (look for
single
> > pass algorithms, and you'll find one of the most commonly recommended
> > algorithms is also one of the worst, In terms of accuracy under some
inputs,
> > and yet an infrequently recommended algorithm is one of the best both in
> > terms of ease of implementation, speed and accuracy).  And you will
still
> > find good mathematicians defending the bad code by saying it is
> > mathematically correct, but this is because they do not understand the
> > consequences of finite precision arithmetic and rounding error.
> >
> > I would observe, as an outsider, that what CRAN is apparently doing is
> > primarily focussed on the first issue above, but going beyond what the R
> > interpreter does to get a better handle on a system of warnings with an
> > extension to notes.  The notes question I can understand as a pragmatic
> > matter.  If I were assigned to do the same sort of thing, I would
probably
> > do it in a similar manner, leaving some things as notes until both I and
the
> > community I serve develop a better understanding of the issues involved
in
> > the subject of the notes to the point of being better able to either
have
> > them evolve into more precisely defined warnings or die.  I understand
that
> > getting many of these things can get tedious and time consuming, but in
fact
> > there is no other way for a community to analyse the issues involved and
> > develop a good understanding of how best to handle them.
> >
> > But since CRAN does not appear to require requirements engineering to be
> > completed along with a comprehensive suite of QA tests, there is no
possible
> > way they can offer any guarantees or even recommendations that any
> package
> > on CRAN is good quality.  From the current reaction to mere notes, I can
> > imagine the reaction that would arise should they ever decide to do so.
It
> > is very much up to the 'consumer' to search CRAN, and evaluate each
> > interesting package to ensure it works as advertised, and I have no
doubt
> > that some are gems while others are best avoided.
> >
> > Just my $0.02 ...
> >
> > Cheers
> >
> > Ted
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> 
> 
> --
> Spencer Graves, PE, PhD
> President and Chief Technology Officer
> Structure Inspection and Monitoring, Inc.
> 751 Emerson Ct.
> San José, CA 95126
> ph:  408-655-4567
> web:  www.structuremonitoring.com