[Rd] Citation of R packages

John Maindonald john.maindonald at anu.edu.au
Sat Feb 11 00:32:17 CET 2006

Even if a CITATION file is included, there is an issue of what to put  
in it.
Authorship of a book or paper is not always the simple matter that might
appear.  With an R package, it can be a far from simple matter.  We are
trying to adapt a tool, surely, that was designed for different  

1. I'd like to see the definition of a new BibTeX entry type that has  
fields for
additional author details and version number. There is surely some
mechanism for getting agreement on a new entry type.

2. In any case, there's a message for maintainers of packages to include
CITATION files that reflect what they want to appear in any citation,  
citation("lattice") as maybe a suitable model?


John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Mathematical Sciences Institute, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.

On 11 Feb 2006, at 5:36 AM, Friedrich.Leisch at tuwien.ac.at wrote:

>>>>>> On Fri, 10 Feb 2006 21:01:44 +1100,
>>>>>> John Maindonald (JM) wrote:
> [...]
>> Where there is a published paper or a book (such as MASS), or a
>> manual for which a url can be given, my decision was to include
>> that in the main list of references, but not to include references
>> there that were references to the package itself, which as you
>> suggest below can be a reference to the concatenated help pages.
> The CITATION file of a package may contain as many entries as the
> author wants, including both a reference to the help pages and to the
> book (or whatever).
>> It seemed anyway useful to have a separate list of packages.  For
>> consistency, these were always references to the package, with a
>> cross-reference to any relevant document in the references to papers.
>>>> (2) Maybe the author field should be more nuanced, or
>>>> maybe ...
>>> author fields of bibtex entries have a strict format (names  
>>> separated
>>> by "and"), what do you mean by "more nuanced"?
>> Those named in the list of authors may be any combination of: the
>> authors
>> of an R package, the authors of an original S version, the person or
>> persons
>> responsible for an R port, the authors of the Fortran code, compiler
>> (s), and
>> contributors of ideas.
>> For John Fox's car, citation() gives the following:
>>      author = {John Fox. I am grateful to Douglas Bates and David
>> Firth and Michael Friendly and Gregor Gorjanc and Georges Monette and
>> Henric Nilsson and Brian Ripley and Sanford Weisberg and and Achim
>> Zeleis for various suggestions and contributions.},
>> For Rcmdr:
>>      author = {John Fox and with contributions from Michael Ash and
>> Philippe Grosjean and Martin Maechler and Dan Putler and and Peter
>> Wolf.},
>> For car, maybe John Fox should be identified as author.  For Rcmdr,
>> maybe the other persons that are named should be added?
>> For leaps:
>>      author = {Thomas Lumley using Fortran code by Alan Miller},
>> It seems reasonable to cite Lumley and Miller as authors.  Should
>> there be a note that identifies Miller as the contributor of the
>> Fortran code?
>> Should the name(s) of porters (usually from S) be included as author
>> (s)?  Or should their contribution be acknowledged in the note field?
>> Or ...
>> Possibilities are to cite all those individuals as author, or to cite
>> John Fox only,
>> with any combination of no additional information in the note field,
>> or using the
>> note field to explain who did what.  The citation() function leaves
>> it unclear who
>> are to be acknowledged as authors, and in fact
> Umm, the problem there is not the citation() function, but that the
> authors of all those packages obviously have not included a CITATION
> file in their package which overrides the default (extracted from the
> E.g., package flexclust has DESCRIPTION
> Package: flexclust
> Version: 0.8-1
> Date: 2006-01-11
> Author: Friedrich Leisch, parts based on code by Evgenia Dimitriadou
> but
> ****
> R> citation("flexclust")
> To cite package flexclust in publications use:
>   Friedrich Leisch. A Toolbox for K-Centroids Cluster Analysis.
>   Computational Statistics and Data Analysis, 2006. Accepted for
>   publication.
> A BibTeX entry for LaTeX users is
>   @Article{,
>     author = {Friedrich Leisch},
>     title = {A Toolbox for K-Centroids Cluster Analysis},
>     journal = {Computational Statistics and Data Analysis},
>     year = {2006},
>     note = {Accepted for publication},
>   }
> ****
> because the CITATION file overrides the DESCRIPTION file. Writing a
> CITATION file is of course also intended for those cases where a
> proper reference cannot be auto-generated from the DESCRIPTION file.
>>>> (3) In compiling a list of packages, name order seems
>>>> preferable, and one wants the title first (achieved by
>>>> relocating the format.title field in the manual FUNCTION
>>>> in the .bst file
>>>> (4) manual seems not an ideal name for the class, if
>>>> there is no manual.
>>> A package always has a "reference manual", the concatenated help  
>>> pages
>>> certainly qualify as such and can be downloaded in PDF format from
>>> CRAN. The ISBN rules even allow to assign an ISBN number to the  
>>> online
>>> help of a software package which also can serve as the ISBN  
>>> number of
>>> the *software itself* (which we did for base R).
>> I'd prefer some consistency in the way that R packages are  
>> referenced.
>> Thus, if reference for one package is to the concatenated help pages,
>> do it that way for all of them.
> But we recommend that package authors should (try to) get their work
> into reviewed journals like JSS, JCGS, or CSDA, and then package
> authors usually prefer if the article gets cited. Unfortunately, many
> academic institutions value paper publications higher than software.
> Citing the help pages is mainly intended as a substitute if no journal
> article is available.
> Best,
> Fritz

More information about the R-devel mailing list