[R] Keywords and Concepts - CTFS package

Pamela Hall phall at alum.mit.edu
Tue Jun 15 19:37:42 CEST 2004


The package I am writing is for the Center for Tropical Forest Science, CTFS.  This "center" is a collaboration of 15+ institutions world wide that are investigating properties of tropical forest dynamics, species diversity, species distributions.  The investigation is composed of the same sampling design of the forest: a large 50 hectare plot (usually) in which every tree >= 10 mm in diameter has been tagged, mapped and identified.  Reenumerations occur very 5 years (mostly).  Other information such as topography, canopy structure, seedling traps, etc, etc. are collected to different degrees at different sites.  Some sites have more than one plot, some have 2, 25 ha plots, some have 52 hectare, species identification can easily take 10 years and counting, some sites have 1200 species, etc. etc.  It is a very large project with 18 pantropical sites, over 3 million trees, 5000+ species with data ranging from first census to 7th census.

The people doing the field work and the analysis vary tremendously in their backgrounds and expertise.  There are many collaborators who have done little or none of the field work, some haven't even been to the sites.  The types of analyses that are done are wide ranging, but clearly cannot be crammed into a "standard" stats package.  The powers that be at CTFS decided to adopt R for analysis and away we went.

Four years have gone by and we have an odd collection of functions, few documented even inside the code (programmers hate to document), odd collection of manuals (mostly written by me) of varying complexity and integrity.  It became obvious to me this year that we needed to use more of R capacities and quit reinventing the packaging of functions and manuals for running them.  So I have taken on the responsibility of being the clearing house for all CTFS functions, checking them for usefulness, function and generality (which is not always necessary) and writing up help files.  And now, thanks to you guys, I have managed to start the process of packaging it all together so we can all work from the same function resource base!

Now, how to fit our functions into \keywords{} and use \concepts{}.

1.  Many functions are CTFS data structure specific: 

mort.spp.habitat() which computes the mortality for a "population" of individuals that belong to a single species and occupy a "habitat" defined in previous analyses and mapped to locations in the plot.  

2.  Some functions that do most of the "work" are more generic:
mort.calc() which calculates the annual mortality rate of a population and provides confidence limits to the rate through other functions.

3.  Some are very generic and just make it simpler to interact with other CTFS programs that could probably be made more generic, but that hasn't happened yet or may take too long to run (we do a lot of randomization and generation of distributions for assigning probabilities for statistical results):
split.data() which takes a dataset of census information on trees that is a dataframe and makes a new dataset that is a list of dataframes, 1 dataframe for each species - just restructures the data for ease of use in other programs and for more rapid access of, in this case, species based information.

I believe I understand the \concepts{} section of Rd files... 

Knowing the audience for whom I am writing this package, I have provided a number of values for each function as concepts so that help.search() will dig up related and useful functions.  How the CTFS functions relate to each other is a very audience dependent phenomena.  I'll try out my ideas on the CTFS users and let them tell me whether they took a long time to find what they needed or not.

I now understand that \keywords{} are not in the usual sense and I have  viewed KEYWORDS.

For function #3 above, I would say this is a type of data manipulation and is used, within CTFS as a file utility, so is the keyword:

\keyword{Basics:manip, utilities}

For function #2 above, mortality rate is just a piece of arithmetic, the CI assigned as stats, but this isn't a survival analysis, its just defined computation suitable for the uses of the CTFS crowd .  so what keyword is appropriate?

Function #1 is a form of data manipulation too, but so are all of our R programs.

Now, I'm confused.  I agree that there is no reason to create a new keyword since the CTFS stuff is so specific, but should I just call nearly everything we write "misc"?

-ph




More information about the R-help mailing list