[Rd] issue with data()

Therneau, Terry M., Ph.D. therne@u @end|ng |rom m@yo@edu
Tue Feb 16 15:39:36 CET 2021

I am testing out the next release of survival, which involves running R CMD check on 868 
CRAN packages that import, depend or suggest it.

The survival package has a lot of data sets, most of which are non-trivial real examples 
(something I'm proud of).  To save space I've bundled many of them, .e.g., data/cancer.rda 
has 19 different dataframes.

This caused failures in 4 packages, each because they have a line such as "data(lung)"  or 
data(breast, package= "survival"); and the data() command looks for a file name.

This is a question about which option is considered the best (perhaps more of a poll), 
between two choices

1. unbundle them again  (it does save 1/3 of the space, and I do get complaints from R CMD 
build about size)
2. send notes to the 4 maintainers.  The help files for the data sets have the usage 
documented as  "lung" or "breast", and not data(lung), so I am technically legal to claim 
they have a mistake.

A third option to make the data sets a separate package is not on the table.  I use them 
heavily in my help files and test suite, and since survival is a recommended package I 
can't add library(x) statements for  !(x %in% recommended).   I am guessing that this 
would also break many dependent packages.

Terry T.

Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
therneau using mayo.edu

"TERR-ree THUR-noh"

	[[alternative HTML version deleted]]

More information about the R-devel mailing list