[Rd] read.table and strip.white

Prof Brian D Ripley ripley@stats.ox.ac.uk
Fri, 2 Jun 2000 06:48:55 +0100 (BST)


On 1 Jun 2000, Peter Dalgaard BSA wrote:

> Prof Brian D Ripley <ripley@stats.ox.ac.uk> writes:
> 
> > On Wed, 31 May 2000, Uwe Ligges wrote:
> > 
> > > Prof Brian Ripley wrote:
> > > 
> > > > > [...]
> > > 
> > > > I was rather surprised here, and this is not what the prototype does:
> > > > 
> > > >   .col1. .col2. .col3.
> > > > 1      1      1      1
> > > > 2      2      2      2
> > > > 
> > > > Should not strip.white be true for the header line?
> > > 
> > > If there are no compatibility problems, setting it to TRUE would be
> > > useful ...
> > > But compatibility is an important point, especially in read.table(.), I
> > > think. Many R users have got their own functions using read.table(.), I
> > > suppose. Maybe changing the defaults could break anything?
> > 
> > That's tantamount to saying we should not fix bugs because users might be
> > relying on the undocumented and unintended behaviour!  Yes, changing this
> > could change things: for a start the V&R MASS datasets would load correctly
> > on R without R-specific editing.  Does anyone seriously intend to have
> > heading spaces in their column names in a data frame? Especially as those
> > are not S variable names, and as you see S-PLUS (but not R) does enforce
> > that.  (read.table circumvents that by not using the class
> > constructor, but S-PLUS has an explicit call to make.names lacking in R.)
> > 
> > I am inclined to make the change _and_ to check the column names.
> 
> I tend to agree, recently having had dealings with an SPSS-created
> .cvs file with "Height SDS" and so on as variable names. However, it
> would seem that the prototype *does* read the leading whitespace and
> replaces it with a dot while fixing the names, no? I don't disagree
> that we're probably better off without that effect, but if we're going
> for compatibility.

No, those are the quotes being converted to `.'  The original example
was

"col1", "col2", "col3"

Now in R the quoting mechanism removes the quotes, but with quote=""
one (now) gets X.col1. etc.  S-PLUS does not have the quoting mechanism,
and leading spaces just disappear (as shown in my original posting).

I've added this in, so now read.table has check.names=TRUE and
strip.white=FALSE arguments, and the header line is always stripped
but only converted if check.names is true.  If you want leading spaces
in the col names, you will need to use quotes and check.names=FALSE,
but you can still do it.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._