[Rd] organisation of packages & CRAN

Spencer Graves spencer.graves at structuremonitoring.com
Mon Nov 10 07:20:31 CET 2014


Hi, Gábor:


On 11/9/2014 7:58 PM, Gábor Csárdi wrote:
> A little more details about the metacran search, to show how it (imo)
> solves a different problem than sos, rseek, RSiteSearch, or
> rdocumentation.org.
>
> 1. The most important difference is that it searches for _packages_.
> The results are packages, not functions, vignettes, etc. E.g. if you
> want to find all packages that interact with google apis, you can just
> say (https://github.com/metacran/seer is the CLI version):
>
> library(seer)
>> see("google")
> SAW "google" -------------------------------- 25 packages in 0.013 seconds ---
>   #  # Title     # Package
>   1  RgoogleMaps Overlays on Google map tiles in R
>   2  ggmap       A package for spatial visualization with Google Maps and Ope...
>   3  RGA         A Google Analytics API client for R
>   4  plotKML     Visualization of spatial and spatio-temporal objects in Goog...
>   5  googleVis   Interface between R and Google Charts
>   6  scholar     Analyse citation data from Google Scholar
>   7  translateR  Bindings for the Google and Microsoft Translation APIs
>   8  plusser     A Google+ Interface for R
>   9  gooJSON     Google JSON Data Interpreter for R
>   10 translate   Bindings for the Google Translate API v2
>> more()
> SAW "google" -------------------------------- 25 packages in 0.012 seconds ---
>   #  # Title          # Package
>   11 ngramr           Retrieve and plot Google n-gram data
>   12 RGoogleAnalytics R Wrapper for the Google Analytics API
>   13 R2G2             Converting R CRAN outputs into Google Earth.
>   14 plotGoogleMaps   Plot spatial or spatio-temporal data over Google Maps
>   15 googlePublicData An R library to build Google's Public Data Explorer DSP...
>   16 RWeather         R wrapper around the Yahoo! Weather, Google Weather and...
>   17 sysfonts         Loading system fonts into R
>   18 hashFunction     A collection of non-cryptographic hash functions
>   19 rgauges          R wrapper to Gaug.es API
>   20 splitstackshape  Stack and Reshape Datasets After Splitting Concatenated...
>
> 2. The second difference is that metacran ranks the search results
> based on (among other things) the package dependency graph, so if you
> search for 'graphics' lattice and ggplot2 come first.
>
> 3. Another difference is that metacran exposes a full search API of
> the underlying ElasticSearch engine, so if someone wants to rank
> results differently, or make more difficult complex queries, they can.
>
> 4. It does not search code and docs. I think rdocumentation.org does a
> good job with docs, and http://github.com/cran is great for code, e.g.
> if you want packages that call SET_SLOT in C:
> https://github.com/search?l=c&q=SET_SLOT+user%3Acran&ref=searchresults&type=Code&utf8=%E2%9C%93


       Thanks for the explanation of metacran/seer.


       "sos" is also designed to identify packages, but it does it based 
on the number and rank of help pages matching the search term.  I often 
do "a|b" to obtain the union of two different searches then use 
"writeFindFn2xls" to output the result to an MS Excel file with 3 
sheets:  (1) a package summary,  (2) the raw search results of help 
pages sorted by package, and (3) info on the search terms used.  
"findFn" has a "sortBy" that allows a user to change the default sort 
order, but I've never used it.  Part of the information from the package 
summary is taken from installed packages and is missing for packages 
that are not installed.  "sos" includes "installPackages" to install the 
highest ranking packages, but that's a poor solution to the problem.  
I'd be happy to work with others who can potentially improve the 
selection of information to present and get it all without installing 
the packages first. Spencer

>
> Gabor
>
> On Sun, Nov 9, 2014 at 7:18 PM, Spencer Graves
> <spencer.graves at prodsyse.com> wrote:
>>        Might it be appropriate to add "http://metacran.github.io/search" and
>> the "sos" package to the official list of R search capabilities at
>> "www.r-project.org/search.html"?  [Disclaimer:  I'm the lead author of
>> "sos".]
>>
>>
>>        Best Wishes,
>>        Spencer Graves
>>
>>
>> On 11/9/2014 11:06 AM, Gábor Csárdi wrote:
>>> Hi,
>>>
>>> I think much of this is simply impossible to do. CRAN packages are
>>> written and maintained by thousands of people, how are you planning to
>>> convince them to reorganize their packages? Or even just rename them?
>>> This obviously won't happen.
>>>
>>> Btw. did you see 'CRAN Task Views'? That is one organizations of
>>> packages into topics.
>>>
>>> Personally, I don't think organization is the solution here. It is too
>>> costly (i.e. too much work) to maintain, impossible to enforce. I
>>> think, however, that a good search engine would definitely help.
>>>
>>> FWIW there is a simple search engine here:
>>> http://metacran.github.io/search/
>>> This ranks packages according to the number of reverse dependencies
>>> (among other things), i.e. packages more often used by other packages
>>> will be higher up in the list.
>>>
>>> Ranking them according to downloads is also possible, but AFAIK only
>>> one CRAN mirror gives out statistics about downloads, so you don't
>>> really have the complete numbers there.
>>>
>>> Disclaimer: I built the search engine above. There are obviously other
>>> alternatives as well, e.g. http://rdocumentation.org, and
>>> http://mran.revolutionanalytics.com/packages/ are the two I know.
>>>
>>> Gabor
>>>
>>> On Sun, Nov 9, 2014 at 11:24 AM, Steven Sagaert
>>> <steven.sagaert at gmail.com> wrote:
>>>> Hi,
>>>> I’ve been using R on and off for a couple of years. I think R is pretty
>>>> great but one thing I’d like to see improved is the way packages are
>>>> organised. Instead of CRAN being a long list of packages having a short &
>>>> usually unintelligible name I ‘d like to see packages organised in a
>>>> hierarchical way with that path acting as a hierarchical namespace just like
>>>> you have in many other languages like Java, C#,Scala,… The names of the
>>>> (sub)packages should also be clear and unambiguous & packages should be
>>>> organised according to their functionality and not just for example be code
>>>> for a whole book thrown together and given a cryptic name.
>>>>
>>>> Next to that it would be nice to have extra metadata in the packages to
>>>> allow for another more loose flat multi-class class-action like in tagging
>>>> blog systems & other metadata to allow for for automatically generating
>>>> something like task views.
>>>>
>>>> Due to the large number of packages it’s hard to see the forest from the
>>>> trees so a recommendation system for CRAN based on popularity (download
>>>> statistics) , ratings & other data  like related packages from package
>>>> metadata would be most welcome.
>>>>
>>>> Finally the number of packages in CRAN is exponentially growing but there
>>>> is also a large partial overlap in functionality between packages & so many
>>>> packages make it hard to find what you are looking for. So maybe there less
>>>> is more and there should be a system of removing hardly used/low quality
>>>> packages on a regular basis.
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list