[R] popular R packages

Sun Mar 8 17:11:47 CET 2009

On 08-Mar-09 15:14:03, Duncan Murdoch wrote:
> On 08/03/2009 10:49 AM, hadley wickham wrote:
>>> More seriously : I don't think relative numbers of package downloads
>>> can be interpreted in any reasonable way, because reasons for
>>> package download have a very wide range from curiosity ("what's
>>> this ?"), fun (think "fortunes"...), to vital need tthink lme4
>>> if/when a consensus on denominator DFs can be reached :-)...).
>>> What can you infer in good faith from such a mess ?
>> 
>> So when we have messy data with measurement error, we should just
>> give up?  Doesn't sound very statistical! ;)
> 
> I think the situation is worse than messy.  If a client comes in with 
> data that doesn't address the question they're interested in, I think 
> they are better served to be told that, than to be given an answer that
> is not actually valid.  They should also be told how to design a study 
> that actually does address their question.
> 
> You (and others) have mentioned Google Analytics as a possible way to 
> address the quality of data; that's helpful.  But analyzing bad data 
> will just give bad conclusions.
> Duncan Murdoch

The population of R users (which we would need to sample in order
to obtain good data) is probably more elusive than a fish population
in the ocean -- only partially visible at best, and with an unknown
proportion invisible.

At least in Fisheries research, there are long established capture
techniques (from trawling to netting to electro-fishing to ... )
which can be deployed, for research purposes, in such a way as to
potentially reach all members of a target population, with at least
a moderately good approximation to random sampling. What have we
for R?

Come to think of it, electro-fishing, ...

Suppose R were released with 2 types of cookie embedded in base R.
Each type is randomly configured, when R is first run, to be Active
or Inactive (probability of activation to be decided at the design
stage ... ). Type 1, if active, on a certain date generates an
event which brings it to the notice of R-Core (e.g. by clandestine
email or by inducing a bug report). Type 2 acts similarly on a later
date. If Type 2 acts, it carries with it information as to whether
there was a Type 1 action along with whether, apparently, the Type 1
action "succeeded".

We then have, in effect, an analogue of the Mark-Recapture technique
of population estimation (along with the usual questions about
equal catchability and so forth).

However, since this sort of thing (which I am not proposing seriously,
only for the sake of argument) is undoubtedly unethical (and would
do R's reputation no good if it came to light), I tentatively conclude
that the population of R users is likely to remain as elusive as ever.

Best wishes to all,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-09                                       Time: 16:11:44
------------------------------ XFMail ------------------------------