[Rd] organisation of packages & CRAN

Gábor Csárdi csardi.gabor at gmail.com
Mon Nov 10 04:58:48 CET 2014


A little more details about the metacran search, to show how it (imo)
solves a different problem than sos, rseek, RSiteSearch, or
rdocumentation.org.

1. The most important difference is that it searches for _packages_.
The results are packages, not functions, vignettes, etc. E.g. if you
want to find all packages that interact with google apis, you can just
say (https://github.com/metacran/seer is the CLI version):

library(seer)
> see("google")
SAW "google" -------------------------------- 25 packages in 0.013 seconds ---
 #  # Title     # Package
 1  RgoogleMaps Overlays on Google map tiles in R
 2  ggmap       A package for spatial visualization with Google Maps and Ope...
 3  RGA         A Google Analytics API client for R
 4  plotKML     Visualization of spatial and spatio-temporal objects in Goog...
 5  googleVis   Interface between R and Google Charts
 6  scholar     Analyse citation data from Google Scholar
 7  translateR  Bindings for the Google and Microsoft Translation APIs
 8  plusser     A Google+ Interface for R
 9  gooJSON     Google JSON Data Interpreter for R
 10 translate   Bindings for the Google Translate API v2
> more()
SAW "google" -------------------------------- 25 packages in 0.012 seconds ---
 #  # Title          # Package
 11 ngramr           Retrieve and plot Google n-gram data
 12 RGoogleAnalytics R Wrapper for the Google Analytics API
 13 R2G2             Converting R CRAN outputs into Google Earth.
 14 plotGoogleMaps   Plot spatial or spatio-temporal data over Google Maps
 15 googlePublicData An R library to build Google's Public Data Explorer DSP...
 16 RWeather         R wrapper around the Yahoo! Weather, Google Weather and...
 17 sysfonts         Loading system fonts into R
 18 hashFunction     A collection of non-cryptographic hash functions
 19 rgauges          R wrapper to Gaug.es API
 20 splitstackshape  Stack and Reshape Datasets After Splitting Concatenated...

2. The second difference is that metacran ranks the search results
based on (among other things) the package dependency graph, so if you
search for 'graphics' lattice and ggplot2 come first.

3. Another difference is that metacran exposes a full search API of
the underlying ElasticSearch engine, so if someone wants to rank
results differently, or make more difficult complex queries, they can.

4. It does not search code and docs. I think rdocumentation.org does a
good job with docs, and http://github.com/cran is great for code, e.g.
if you want packages that call SET_SLOT in C:
https://github.com/search?l=c&q=SET_SLOT+user%3Acran&ref=searchresults&type=Code&utf8=%E2%9C%93

Gabor

On Sun, Nov 9, 2014 at 7:18 PM, Spencer Graves
<spencer.graves at prodsyse.com> wrote:
>       Might it be appropriate to add "http://metacran.github.io/search" and
> the "sos" package to the official list of R search capabilities at
> "www.r-project.org/search.html"?  [Disclaimer:  I'm the lead author of
> "sos".]
>
>
>       Best Wishes,
>       Spencer Graves
>
>
> On 11/9/2014 11:06 AM, Gábor Csárdi wrote:
>>
>> Hi,
>>
>> I think much of this is simply impossible to do. CRAN packages are
>> written and maintained by thousands of people, how are you planning to
>> convince them to reorganize their packages? Or even just rename them?
>> This obviously won't happen.
>>
>> Btw. did you see 'CRAN Task Views'? That is one organizations of
>> packages into topics.
>>
>> Personally, I don't think organization is the solution here. It is too
>> costly (i.e. too much work) to maintain, impossible to enforce. I
>> think, however, that a good search engine would definitely help.
>>
>> FWIW there is a simple search engine here:
>> http://metacran.github.io/search/
>> This ranks packages according to the number of reverse dependencies
>> (among other things), i.e. packages more often used by other packages
>> will be higher up in the list.
>>
>> Ranking them according to downloads is also possible, but AFAIK only
>> one CRAN mirror gives out statistics about downloads, so you don't
>> really have the complete numbers there.
>>
>> Disclaimer: I built the search engine above. There are obviously other
>> alternatives as well, e.g. http://rdocumentation.org, and
>> http://mran.revolutionanalytics.com/packages/ are the two I know.
>>
>> Gabor
>>
>> On Sun, Nov 9, 2014 at 11:24 AM, Steven Sagaert
>> <steven.sagaert at gmail.com> wrote:
>>>
>>> Hi,
>>> I’ve been using R on and off for a couple of years. I think R is pretty
>>> great but one thing I’d like to see improved is the way packages are
>>> organised. Instead of CRAN being a long list of packages having a short &
>>> usually unintelligible name I ‘d like to see packages organised in a
>>> hierarchical way with that path acting as a hierarchical namespace just like
>>> you have in many other languages like Java, C#,Scala,… The names of the
>>> (sub)packages should also be clear and unambiguous & packages should be
>>> organised according to their functionality and not just for example be code
>>> for a whole book thrown together and given a cryptic name.
>>>
>>> Next to that it would be nice to have extra metadata in the packages to
>>> allow for another more loose flat multi-class class-action like in tagging
>>> blog systems & other metadata to allow for for automatically generating
>>> something like task views.
>>>
>>> Due to the large number of packages it’s hard to see the forest from the
>>> trees so a recommendation system for CRAN based on popularity (download
>>> statistics) , ratings & other data  like related packages from package
>>> metadata would be most welcome.
>>>
>>> Finally the number of packages in CRAN is exponentially growing but there
>>> is also a large partial overlap in functionality between packages & so many
>>> packages make it hard to find what you are looking for. So maybe there less
>>> is more and there should be a system of removing hardly used/low quality
>>> packages on a regular basis.
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list