[Rd] Storage of character vectors in R

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jun 29 09:48:36 CEST 2006


There have been comments from time to time (over many years) on the 
inefficiency of the storage of character vectors in R, and R-core has been 
looking into the issues.  We have some ideas but they would be a 
considerable amount of work to implement and it is unclear if they would 
actually help with current real-world problems.

One example was the storage of integer row names for data frames, but such 
row names are stored much more efficiently in R-devel (2.4.0-to-be).
We do have some other examples but these are highly artificial.

What we would like is some real-world examples of problems in which users 
have found the storage of character vectors to be an appreciable problem.
Ideally we want concrete reproducible examples that show the problem in 
R-devel, but abstractions of such examples (for example using synthetic 
rather than real data) would also be very helpful.

If you can help, please do so by replying to this thread (and making 
examples available via URLs would probably be the most efficient route).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list