[Rd] Re: [R] A long digression on packages
murdoch at stats.uwo.ca
Sun Jun 5 15:46:15 CEST 2005
Hi. I think this discussion is more relevant to R-devel, so that's
where I've sent my reply.
Jim Lemon wrote:
> Hello again,
> First, thanks for the help that got the latest plotrix package finished.
> I had been planning to write something about packages since Scott
> Waichler offered the gantt.chart function. Then Ben Bolker (who helped
> me to write the axis.break function) asked if I would be willing to
> include some of his plotting functions and almost immediately after that
> Sander Oom kindly donated the soil texture plotting function in the same
> way. I could procrastinate no longer.
> There are now about 500 packages on CRAN. Some are focused, covering a
> particular area well, easy for the prospective user to discover their
> potential usefulness, while others are less so. I consider the plotrix
> package one of the former, and so as not to upset too many people, I
> will use the other package I contributed to CRAN as an example of the
> When I initially wrote concord, it was intended as a package of
> functions dealing with concordance and reliability. Okay, but I found
> Kendall's W so useful that I couldn't help including it, and somehow
> Page's test of ordered alternatives crept in and invited the Jonckheere
> test to the party and at that point I realized that I had maybe forty or
> fifty more or less useful functions floating around my R directory. Now
> many of these are probably floating around other people's R directories
> as well. Consider Cohen's kappa. The tabular method is included in
> e1071, my version has Cohen's plus two additional methods, and the
> recently contributed psy package has yet another version. Maybe there
> are still more encrypted in packages that I haven't even looked at.
> The point of all this is that it would make many user's lives easier if
> there were less pandemonium in packages. The mistakes I have made in
> concord I have tried not to repeat in plotrix. Unless a user search of
> the documentation in packages materializes, it's become mighty hard to
> work out if the function you don't want to write has already been
> written. We also spend a lot of time responding to or deriding
> correspondents who ask about such things.
> Would it be an idea to have informal R periphery teams, or even
> individual package lords, who would bear with, or maybe welcome, other
> people's functions? That is, I think plotrix has been greatly enhanced
> by recent contributions. Conversely, I wonder if it would be possible to
> shrink or maybe even evaporate concord by discovering duplicate methods
> in other packages or by contributing concord functions or parts thereof
> myself. It's not that I don't like maintaining concord or think the
> functions are worthless, just that I am mildly embarrassed to be adding
> to the duplication of effort and unnecessary volume of packages.
> Feel free to comment upon this, although if you really want to rave, try
> it out on me first before clagging the list. Thanks for your attention.
A difficulty with multi-author packages is that it's harder to maintain
consistency within the package, and it's harder to handle maintenance.
Another approach is to try to keep your packages small and focussed.
The problem with this is what you mentioned above: there are already
500 packages, and it's hard to know what's there. The "task views"
should help with this, there are 5 online so far. (See
<http://cran.us.r-project.org/src/contrib/Views>.) There is also a need
for Misc packages for things too small to be a package on their own, but
I think we need better ways to expose what is in them.
Of course, with disk sizes as they are now, it's not unreasonable to
install all of the contributed CRAN packages on a PC. Then
help.search() *will* do searches through them all.
More information about the R-devel