[Rd] Wish List

Gabor Grothendieck ggrothendieck at gmail.com
Tue Jan 1 20:59:57 CET 2008


Most of the items on this list have been mentioned before but it
may be useful to see them altogether and at any rate every year
I have posted my R wishlist at the beginning of the year.

High priority items pertain to the foundations of R (promises,
environments) since those form the basis of everything
else and the foundation needs to be looked after first.

The medium items are focused on scripting since with a few additional
features R could work more smoothly with other software.

For the Low priority items we listed the rest.  They are not necessarily
low in terms of desirability but I wanted to focus the high and
medium items on foundations and scripting.

There is also a section at the end focusing on addon packages.
These may be strictly speaking part of R but are widely used.

High

1. Some way of inspecting promises.  It is possible to get
the expression associated with a promise using substitute but
not its environment.  Also need a way to copy a promise without
forcing it.  See:
https://stat.ethz.ch/pipermail/r-devel/2007-September/046966.html

2. Fix bug when promises are stored in lists:

    f <- function(x) environment()
    as.list(f(0))$x == 0 # gives error.  Should be TRUE.

3. If a package uses "LazyLoad: true" then R changes the class of
certain top level objects.  This does not occur if "Lazyload: false"
is used.  For an example see:
https://stat.ethz.ch/pipermail/r-devel/2007-October/047118.html

4. If two environment variables point to the same environment they
cannot have different attributes.  This effectively thwarts subclassing
of environments (contrary to OO principles).

Medium

5. Sweave. A common scanario is spawning a Sweave job from
another program (such as from a program controlling a
web site).  The caller needs to pass some information to the
Sweave program such as the file name of a report to produce.
Its possible to spawn R and have R spawn sweave but given the
existence of R CMD Sweave it would be nice to be able to just
spawn R CMD Sweave directly.  Features that would help here
would be:

- support --args or some other method of passing arguments
  from R CMD Sweave line to the Sweave script

- have a facility whereby R CMD Sweave can directly generate
  the .pdf file and an argument which allows the caller to
  define the name of the resulting pdf file, e.g. -o.  (With
  automated reports one may need to have many different outputs
  from the same Rnw file so its important to name them differently.)

- an -x argument similar to Perl/Python/Ruby such that if one calls
  R CMD Sweave -x abc myfile.Rnw then all lines up to the first one
  matching the indicated regexp, abc here, are skipped.  This
  facilitates combining the script with a shell or batch file if the
  previous is not enough.

Thus one could spawn this from their program:
R CMD Sweave --pdf myfile.Rnw -o myfile-123.pdf --args 23
and it would generate a pdf file from myfile.Rnw of the
indicated name passing 23 as arg1 to the R code embedded in the
Sweave file.

See:
https://stat.ethz.ch/pipermail/r-devel/2007-October/047195.html
https://stat.ethz.ch/pipermail/r-help/2007-December/148091.html

6. -x flag on Rscript as in perl/python/ruby.  Useful for combining batch
   and R file into a single file on non-UNIX systems.  It would cause all
   lines until a line starting with #!Rscript to be skipped by the R
   processor.  See:
   https://www.stat.math.ethz.ch/pipermail/r-devel/2007-January/044433.html
   Also see
   http://www.datafocus.com/docs/perl/pod/perlwin32.asp#running_perl_scripts
   since the same considerations as for Perl scripts applies.
   There is also some discussion here:
   https://stat.ethz.ch/pipermail/r-help/2007-November/145279.html
   https://stat.ethz.ch/pipermail/r-help/2007-November/145301.html

Low

7. Define Lag <- function(x, k = 1, ...) lag(x, -k, ,..)

   so the user has his choice of which orientation he prefers.
   Many packages could make use of it if it were in the core of R including
   zoo, dyn, dynlm, fame and others.  This would also address comments
   such as in ISSUE 4 on this page which is associated with a popular
   book on time series:
   http://www.stat.pitt.edu/stoffer/tsa2/Rissues.htm

8. On Windows, package build tools should check that Cygwin is in
correct position on PATH and issue meaningful error if not.  If
you get this wrong currently its quite hard to diagnose unless
you know about it.

9. Implement the R shell() and shell.exec() commands on
non-Windows systems.

10. print.function should be improved to make it obvious how to find what
    the user is undoubtedly looking for in both the S3 and S4 cases.
    That would address one of the criticisms here:
    http://www.stat.columbia.edu/~cook/movabletype/archives/2007/08/they_started_me.html
    (The other criticisms at this link are worth addressing too -- ggplot2
    and several existing or upcoming books on grid, lattice and ggplot
R graphics are
    presumably addressing  the criticism that creating graphics is
difficult in R.)

11. Add { to the derivative table so one can write this:
    f <- function(x) x*x
    deriv(body(f), "x", func = TRUE)

    Currently, one must do:
    deriv(body(f)[[2]], "x", func = TRUE)

12. as.Date.numeric should have Epoch as default origin.  Its currently
    asymmetric since as.numeric(as.Date(x)) does not require specifying
    the Epoch yet you do have to specify it in the reverse direction.
       dd <- Sys.Date()
       x <- as.numeric(dd) # ok
       as.Date(x) # error
    Even worse, sapply (and also ifelse and likely some other functions)
    unclass your dates and you should not have to know the origin to get
    them back with as.Date .

13. In traceback(), environents in the calling sequence are listed
    simply as <environment> so we don't know which environment is being
    referenced.  It would be more helpful if the hash code associated
    with the environment were listed.  Also it would be useful if it were
    possible to inspect an environment given its hash code as this might
    help the programmer determine which environment is being referenced.

14. These provide results which are not valid R:

	> dput(alist(a=1,b=))
	structure(list(a = 1, b = ), .Names = c("a", "b"))

	> dput(alist(a=1,b=),control="all")
	structure(list(a = 1, b = quote()), .Names = c("a", "b"))

15. There should be a "LazyLoad: auto" or similar that corresponds to
    omitting LazyLoad in the DESCRIPTION file.  Currently there no
    explicit argument setting that corresponds to the default LazyLoad
    setting.

16. facility to directly get the hex code for an environment.  Currently
   one must do:
   capture.output(new.env())

17. stats:::nlsModel.  Would like option to NOT calculate derivatives so
    it can be used with derivative-free algorithms.

18. Make as.POSIXlt, difftime, filter, rowMeans and rowSums into S3 generics.
   Various time series and datetime packages such as fame, zoo and chron
   could use these.

Packages
========

19. DBI.
    - It should be possible for a program to discover which DBI drivers
      are loaded.
    - A DBI driver for ODBC would be nice.

20. RSQlite
    - automatically set eol correctly in sqliteImportFile according to
      input file (currently it defaults to "\n" which only works correctly
      for files created on UNIX)
    - either give sqliteImportFile its own .Rd page or make it possible
      to read ?dbWriteTable without reference to sqliteImportFile
    - ability to use arbitrary R functions from within sqlite select statements
      (or second best would be to at least support common functions used in
      statistics that are not in sqlite such as sd, var, etc.)
    - sqliteImportFile should support quoted fields so that .csv files,
      the most common input file format, can be supported
    See: https://stat.ethz.ch/pipermail/r-sig-db/2007-August/000382.html
         https://stat.ethz.ch/pipermail/r-sig-db/2007-August/000384.html

21. MySQL
    - dbWriteTable on Windows inserts extra "\r" characters
    See: https://stat.ethz.ch/pipermail/r-sig-db/2007-August/000385.html

22. grid

- grid.ls() enhancement to show more info (if grid.ls is analogous to ls
  on UNIX then this would be analogous to ls -l) and a grep-like facility
  which only lists grid objects with specified attributes, e.g. just list
  all grid objects that are dark green
- ability to reset grid names so generating the same plot twice gives
  the same sequence of grid names.

My previous wish lists are here:
https://www.stat.math.ethz.ch/pipermail/r-devel/2007-January/044122.html
https://www.stat.math.ethz.ch/pipermail/r-devel/2006-January/035949.html
https://www.stat.math.ethz.ch/pipermail/r-help/2005-January/061984.html
https://www.stat.math.ethz.ch/pipermail/r-devel/2004-January/028465.html



More information about the R-devel mailing list