[Rd] CRAN policies

Fri Mar 30 06:19:29 CEST 2012

On 3/29/2012 8:39 PM, Paul Gilbert wrote:
>
> On 12-03-29 09:29 PM, Mark.Bravington at csiro.au wrote:
> > I'm concerned this thread is heading the wrong way, towards
> > techno-fixes for imaginary problems. R package-building is already
> > encumbered with a huge set of complicated rules, and more
> > instructions/rules eg for metadata would make things worse not better.
> >
> > RCMD CHECK on the 'mvbutils' package generates over 300 Notes about
> > "no visible binding...", which inevitably I just ignore. They arise
> > because RCMD CHECK is too "stupid" to understand one of my preferred
> > coding idioms (I'm not going to explain what-- that's beside the
> > point).
>
> Actually, I think that is the point. If your code is generating that 
> many notes then I think you should explain your idiom, so the checks 
> can be made to accommodate it if it really is good. Otherwise, I'd be 
> worried about the quality of your code.


       The "R CMD check" process is by far the best system I know for 
producing "trustworthy software" (Bill Chambers' term).  I'm not leading 
researcher in software development, merely a guy who has written a fair 
amount of code over the past half century in a number of different 
languages.  I'm now bald, having torn all my hair out trying to debug 
mountains of spaghetti code ;-)  With "R CMD check" (including all those 
pesky notes), I get working code in a third the time (subjective 
guestimate) AND documentation I can had to others as a byproduct for free.


       I've been so impressed with it that I wrote the Wikipedia article 
on "Package development process" and added a table (with help from 
Sundar Dorai-Raj) to the "Software repository" article identifying 
software repositories for different languages, trying to suggest that 
users of other languages could benefit from developing some similar code 
quality checking system for other languages and accompanying repositories.


       I mention this, in case any of you know a researcher in 
information technology / computer science / software engineering who 
might be invited to do a comparison of "R CMD check" with what's 
available for other languages.  I think it could help people writing 
code in other languages -- and it might even generate ideas to help the 
R community.


       Best Wishes,
       Spencer


> > And RCMD CHECK always will be too "stupid" to understand everything
> > that a rich language like R might quite reasonably cause experienced
> > coders to do.
>
> Possibly the interpreter is too stupid to understand it too?
>
> > It should not be CRAN's business how I write my code, or even whether
> > my code does what it is supposed to. It might be CRAN's business to
> > try to work out whether my code breaks CRAN's policies, eg by 
> causing > R to crash horribly-- that's presumably what Warnings are 
> for (but
> > see below). And maybe there could be circumstances where an automatic
> > check might be "worried" enough to alert the CRANia and require manual
> > explanation and emails etc from a developer, but even that seems
> > doomed given the growing deluge of packages.
> >
> > RCMD CHECK currently functions both as a "sanitizer" for CRAN, and as
> > a developer-tool. But the fact that the one programl does both things
> > seems accidental to me, and I think this dual-use is muddying the
> > discussion. There's a big distinction between (i) code-checks that
> > developers themselves might or might not find useful-- which should
> > be left to the developer, and will vary from person to person--
>
> I think this a case of two heads are better than one. I did lots of
> checks before the CRAN checks existed, but the CRAN checks still found 
> bugs in code that I considerer very mature, including bugs in code has 
> been running without noticeable problems for over 15 years. Despite 
> all the noise today, most of us are only talking about a small 
> inconvenience around the intended meaning of "note", not about whether 
> quality control is a bad thing. I've found the errors and warnings are 
> always valid, even though I do not always like having to fix the bugs, 
> and the notes are most often valid too. But there are a few false 
> positives, so the checks that give notes are not yet reliable enough 
> to give warnings or errors. But they should be sometime, so one should 
> usually consider fixing the package code.
>
> >   and (ii) code-checks that CRAN enforces for its own peace-of-mind.
>
> I think of this as being for the piece-of-mind of your package users.
>
> > Maybe it's convenient to have both functions in the same place, and
> > it'd be fine to use Notes for one and Warnings for the other, but the
> > different purposes should surely be kept clear.
> >
> > Personally, in building over 10 packages (only 2 on CRAN), I haven't
> > found RCMD CHECK to be of any use, except for the code-documentation
> > and example-running bits. I know other people have different
> > opinions, but that's the point: one-size-does-not-fit-all when it
> > comes to coding tools.
> >
> > And wrto the Warnings themselves: I feel compelled to point out that
> > it's logically impossible to fully check whether R code will do bad
> > things. One has to wonder at what point adding new checks becomes
> > futile or counterproductive. There must be over 2000 people who have
> > written CRAN packages by now; every extra check and non-back-
> > compatible additional requirement runs the risk of generating false-
> > negatives and incurring many extra person-hours to "fix"
> > non-problems.
> > Plus someone needs to document and explain the check (adding to the
> > rule mountain), plus there is the time spent in discussions like
> > this..!
>
> Bugs in your packages will require users to waste a lot of time too, 
> and possibly reach faulty results with much more serious consequences. 
> Just because perfection may never be attained, this does not mean that 
> progress should not be attempted, in small steps. Compared to Statlib, 
> which basicly followed your recommended approach, CRAN is a vast 
> improvement.
>
> Paul
> >
> > Mark
> >
> > Mark Bravington
> > CSIRO CMIS
> > Marine Lab
> > Hobart
> > Australia
> > ________________________________________
> > From:r-devel-bounces at r-project.org  [r-devel-bounces at r-project.org] 
> On Behalf Of Hadley Wickham [hadley at rice.edu]
> > Sent: 30 March 2012 07:42
> > To: William Dunlap
> > Cc:r-devel at stat.math.ethz.ch; Spencer Graves
> > Subject: Re: [Rd] CRAN policies
> >
> >> Most of that stuff is already in codetools, at least when it is 
> checking functions
> >> with checkUsage().  E.g., arguments of ~ are not checked.  The  
> expr argument
> >> to with() will not be checked if you add  skipWith=FALSE to the 
> call to checkUsage.
> >>
> >> >  library(codetools)
> >>
> >> >  checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ 
> Pred}))
> >> <anonymous>: no visible binding for global variable 'Num' (:1)
> >> <anonymous>: no visible binding for global variable 'Den' (:1)
> >>
> >> >  checkUsage(function(dataFrame) with(dataFrame, {Num/Den ; Resp ~ 
> Pred}), skipWith=TRUE)
> >>
> >> >  checkUsage(function(dataFrame) with(DataFrame, {Num/Den ; Resp ~ 
> Pred}), skipWith=TRUE)
> >> <anonymous>: no visible binding for global variable 'DataFrame'
> >>
> >> The only part that I don't see is the mechanism to add code-walker 
> functions to
> >> the environment in codetools that has the standard list of them for 
> functions with
> >> nonstandard evaluation:
> >> >  objects(codetools:::collectUsageHandlers, all=TRUE)
> >>    [1] "$"             "$<-"           ".Internal"
> >>    [4] "::"            ":::"           "@"
> >>    [7] "@<-"           "{"             "~"
> >>   [10] "<-"            "<<-"           "="
> >>   [13] "assign"        "binomial"      "bquote"
> >>   [16] "data"          "detach"        "expression"
> >>   [19] "for"           "function"      "Gamma"
> >>   [22] "gaussian"      "if"            "library"
> >>   [25] "local"         "poisson"       "quasi"
> >>   [28] "quasibinomial" "quasipoisson"  "quote"
> >>   [31] "Quote"         "require"       "substitute"
> >>   [34] "with"
> > It seems like we really need a standard way to add metadata to 
> functions:
> >
> > attr(with, "special_args")<- "expr"
> > attr(lm, "special_args")<- c("formula", "weights", "subset")
> >
> > This would be useful because it could automatically contribute to the
> > documentation.
> >
> > Similarly,
> >
> > attr(my.new.method, "s3method")<- c("my.new", "method")
> >
> > could be useful.
> >
> > Hadley
> >
> >
> > --
> > Assistant Professor / Dobelman Family Junior Chair
> > Department of Statistics / Rice University
> > http://had.co.nz/
> >
> > ______________________________________________
> > R-devel at r-project.org  mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel at r-project.org  mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>