[Rd] CRAN Server download statistics (Was: R Usage Statistics)

hadley wickham h.wickham at gmail.com
Mon Nov 23 15:12:37 CET 2009


Hi Ian,

I've spoken with Stefan Theussl (cran maintainer) about this, and he's
concerned about the privacy implications of making the apache access
logs public.  A compromise that he mentioned was having a script run
on the cran mirror that processed the log files and output summary
statistics.  Then a central process could aggregate these and produce
a single overall summary.

A few comments on your current site:

 * Are you just including packages downloaded interactively from within R?

 * I don't think the continent from which the package was download is
of much interest.  There's definitely no need to include it on the
main page.

 * I'd be far more interested in changes over time.  Sparklines of the
last month worth of data would be a neat addition to the main page.

 * More vertical whitespace or subtle zebra striping would make it
much easier to read across rows.

 * I'm also not sure about displaying the number of unique IPs. R is
used a lot in the university setting and until ipv6 comes along, many
university downloads will appear to be coming from a single ip
address.

 * It's not very useful to sort by % Windows because the variance
increases as the sample size decreases so the packages with the
highest and lowest % windows are just the packages that aren't
downloaded very often.  Maybe a shrunken estimate?

 * Have you thought at all about how to take package dependences into account?

Hadley

On Sun, Nov 22, 2009 at 6:18 PM, Fellows, Ian <ifellows at ucsd.edu> wrote:
> Hi All,
>
> It seems that the question of how may people use (or download) R, and it's packages is one that comes up on a fairly regular basis in a variety of forums (There was also recent thread on the subject on Stack Overflow). A couple of students at UCLA (including myself), wanted to address the issue, so we set up a system to get and parse the cran.stat.ucla.edu APACHE logs every night, and display some basic statistics. Right now, we have a working sketch of a site based on one week of observations.
>
> http://neolab.stat.ucla.edu/cranstats/
>
> We would very much like to incorporate data from all CRAN mirrors, including cran.r-project.org. We would also like to set this up in a way that is minimally invasive for the site administrators. Internally, our administrator has set up a protected directory with the last couple days of cran activity. We then pull that down using curl.
>
> What would be the best and easiest way for the CRAN mirrors to share their data? Is the contact information for the administrators available anywhere?
>
>
> Thank you,
> Ian Fellows
>
>
>
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org] On Behalf Of Steven McKinney [smckinney at bccrc.ca]
> Sent: Thursday, November 19, 2009 2:21 PM
> To: Kevin R. Coombes; r-devel at r-project.org
> Subject: Re: [Rd] R Usage Statistics
>
> Hi Kevin,
>
> What a surprising comment from a reviewer for BMC Bioinformatics.
>
> I just did a PubMed search for "limma" and "aroma.affymetrix",
> just two methods for which I use R software regularly.
> "limma" yields 28 hits, several of which are published
> in BMC Bioinformatics.  Bengtsson's aroma.affymetrix paper
> "Estimation and assessment of raw copy numbers at the single locus level."
> is already cited by 6 others.
>
> It almost seems too easy to work up lists of usage of R packages.
>
> Spotfire is an application built around S-Plus that has widespread use
> in the biopharmaceutical industry at a minimum.  Vivek Ranadive's
> TIBCO company just purchased Insightful, the S-Plus company.
> (They bought Spotfire previously.)
> Mr. Ranadive does not spend money on environments that are
> not appropriate for deploying applications.
>
> You could easily cull a list of corporation names from the
> various R email listservs as well.
>
> Press back with the reviewer.  Reviewers can learn new things
> and will respond to arguments with good evidence behind them.
> Good luck!
>
>
> Steven McKinney
>
>
> ________________________________________
> From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org] On Behalf Of Kevin R. Coombes [krcoombes at mdacc.tmc.edu]
> Sent: November 19, 2009 10:47 AM
> To: r-devel at r-project.org
> Subject: [Rd] R Usage Statistics
>
> Hi,
>
> I got the following comment from the reviewer of a paper (describing an
> algorithm implemented in R) that I submitted to BMC Bioinformatics:
>
> "Finally, which useful for exploratory work and some prototyping,
> neither R nor S-Plus are appropriate environments for deploying user
> applications that would receive much use."
>
> I can certainly respond by pointing out that CRAN contains more than
> 2000 packages and Bioconductor contains more than 350. However, does
> anyone have statistics on how often R (and possibly some R packages) are
> downloaded, or on how many people actually use R?
>
> Thanks,
>    Kevin
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
http://had.co.nz/



More information about the R-devel mailing list