[Rd] A Call for a Smaller R Core Package

Zepu Zhang zpzhang at stanfordalumni.org
Thu Sep 21 06:50:49 CEST 2006


(Below is my idea on an issue that has troubled me for a fairly long time. I
hope it's not viewed as trouble making.)

A Call for a Smaller R Core Package

This document suggests downsizing the 'core' package of R
by taking out some specialized functionalities to form
their own packages. I'll use string related functions as examples,
because I happened to be troubled by them today.

1. The core is too big

R is a function rich environment.
However, non-central functions are better organized in specialized packages.
>From time to time I felt the need to go through the core package for a
complete picture of what are there at my disposal,
yet so far I haven't done that.
In the 'R Reference Manual' the core package runs for over 400 pages
with about 400 entries, and mysteriously some functions don't show up
in the TOC, e.g. 'sub'.
In the two-volume reference set printed by Network-Theory,
the core is the entire first book.
In contrast, the 'Intrinsic Functions' chapter of the classic Fortran
reference "Fortran 95/2003 Explained" runs for maybe 30(?) pages.
I flipped through it many times and I can say with confidence,
"OK these are ALL the Fortran intrinsics and I know what they do."
For R, I found it an intimidating task to flip through the 400+ pages core
and retain a clear mind at the end.

Below is a random sample of string related functions in the core package:

agrep
basename
charmatch
chartr
gregexpr
grep
gsub
regex
regexpr
strsplit
strtrim
strwrap
sub

In my opinion, anything that uses regular expressions belongs somewhere else.
Even 'utils' seems to be a better place for random items than the 'core'.

2. Benefits of a smaller core

a) A smaller core will be more carefully studied and better appreciated.

If the R core functions were documented in 100 pages,
I would be a much better R programmer than I am today
because I would have singled out and studied the more fundamental routines
about function calls, etc.

The criteria for a function to be in the core seem to be: 1) fundamental; or
2) very often used.

A smaller core is more stable.

b) A specialized 'string' package makes string related functions much easier
to find.

It could be that I still need all the functions.
But since they are grouped together, it greatly helps learning.
I would be very rarely reinventing the wheel, because
I could quickly get a sweeping view of the dedicated package.

c) It will be easier to enrich string-related functionalities without
perplexing the core.

3. Costs of such re-arrangements

a) To the R development team

(I don't really know.)

For those utility functions that are frequently used in basic functions,
they may well stay in the core.
For those that are not, it may not be too difficult to move them around.
The spin-off package may be always automatically loaded as a basic one,
but as discussed above, a cleaning grouping greatly helps learning
and finding things.

b) To R users

The system (both the core and the specialized package) will be easier to learn
and use.


-- Zepu Zhang, zpzhang at uchicago.edu




More information about the R-devel mailing list