[R] Fwd: Documenting data sets with many variables

Arne Henningsen ahenningsen at email.uni-kiel.de
Wed Aug 17 16:48:47 CEST 2005


On Tuesday 16 August 2005 17:26, Gavin Simpson wrote:
> On Tue, 2005-08-16 at 17:11 +0200, Arne Henningsen wrote:
> > On Tuesday 16 August 2005 14:49, Roger D. Peng wrote:
> > > Have you tried using 'promptData()' on the data frame and then
> > > just using the resulting documentation file?
> >
> > Thank you, Roger, for bringing 'promptData()' to my mind. This is really
> > a useful tool. However, in my special case my aim is to reduce the extent
> > and increase the comprehensibility of the documentation rather than to
> > reduce my effort to write the documentation.
> >
> > Any further hints are welcome!
> >
> > Thanks,
> > Arne
>
> Would it not be expedient then to ignore the \format{} section and just
> provide the information on the variables say in the \description{},
> e.g.:

That's a great idea - and so simple!
This perfectly solves my problem.
Thanks,
Arne

> This example taken from package vegan describing 2 data.frames with 44
> and 14 columns. Admittedly, none of the variables in the species dataset
> are explicitly and individually described in this example, but it is
> sufficient in this case I think.
>
> \name{varespec}
> \alias{varechem}
> \alias{varespec}
> \docType{data}
> \title{Vegetation and environment in lichen pastures}
> \usage{
>        data(varechem)
>        data(varespec)
> }
> \description{
>   The \code{varespec} data frame has 24 rows and 44 columns.  Columns
>   are estimated cover values of 44 species.  The variable names are
>   formed from the scientific names, and are self explanatory for anybody
>   familiar with the vegetation type.
> The \code{varechem} data frame has 24 rows and 14 columns, giving the
> soil characteristics of the very same sites as in the \code{varespec}
> data frame. The chemical measurements have obvious names.
> \code{Baresoil} gives the estimated cover of bare soil, \code{Humpdepth}
> the thickness of the humus layer.
>
> }
> ....
>
> HTH
>
> G
>
> > > -roger
> > >
> > > Arne Henningsen wrote:
> > > > Hi,
> > > >
> > > > since nobody answered to my first message, I try to explain my
> > > > problem more clearly and more general this time:
> > > >
> > > > I have a data set in my R package "micEcon", which has many variables
> > > > (82). Therefore, I would like to avoid to describe all variables in
> > > > the "\format" section of the documentation (.Rd file). However, doing
> > > > this lets "R CMD check" complain about "data codoc mismatches"
> > > > (details see below). Is there a way to avoid the description of all
> > > > variables without getting a complaint from "R CMD check"?
> > > >
> > > > Thanks,
> > > > Arne
> > > >
> > > >
> > > > ----------  Forwarded Message  ----------
> > > >
> > > > Subject: Documenting data sets with many variables
> > > > Date: Friday 05 August 2005 14:03
> > > > From: Arne Henningsen <ahenningsen at email.uni-kiel.de>
> > > > To: R-help at stat.math.ethz.ch
> > > >
> > > > Hi,
> > > >
> > > > I extended the data set "Blanciforti86" that is included in my R
> > > > package "micEcon". For instance, I added consumer prices, annual
> > > > consumption expenditures and expenditure shares of eleven aggregate
> > > > commodity groups. The corresponding variables in the data frame are
> > > > called "pAgg1", "pAgg2", ..., "pAgg11", "xAgg1", "xAgg2", ...,
> > > > "xAgg11", "wAgg1", "wAgg2", ..., "wAgg11". To avoid to describe all
> > > > 33 items in the "\format" section of the documentation (.Rd file) I
> > > > wrote something like
> > > >
> > > > \format{
> > > >    This data frame contains the following columns:
> > > >    \describe{
> > > >       [ . . . ]
> > > >       \item{xAggX}{Expenditure on the aggregate commodity group X
> > > >          (in Millions of US-Dollars).}
> > > >       \item{pAggX}{Price index for the aggregate commodity group X
> > > >          (1972 = 100).}
> > > >       \item{wAggX}{Expenditure share of the aggregate commodity group
> > > > X.} [ . . . ]
> > > >    }
> > > > }
> > > >
> > > > and explained the 11 aggregate commodity groups only once in a
> > > > different section (1=food, 2=clothing, ... ). However, "R CMD check"
> > > > now complains about "data codoc mismatches", e.g.
> > > >   Code: [...] pAgg1pAgg2 pAgg3  [...]
> > > >   Docs: [...] pAggX [...]
> > > >
> > > > Is there a way to avoid the description of all 33 items without
> > > > getting a complaint from "R CMD check"?
> > > >
> > > > Thanks,
> > > > Arne
> > > >
> > > > -------------------------------------------------------

-- 
Arne Henningsen
Department of Agricultural Economics
University of Kiel
Olshausenstr. 40
D-24098 Kiel (Germany)
Tel: +49-431-880 4445
Fax: +49-431-880 1397
ahenningsen at agric-econ.uni-kiel.de
http://www.uni-kiel.de/agrarpol/ahenningsen/




More information about the R-help mailing list