[Rd] RFC: type conversion in read.table

Prof Brian D Ripley ripley@stats.ox.ac.uk
Fri, 31 Aug 2001 13:13:03 +0100 (BST)


On Fri, 31 Aug 2001, Kurt Hornik wrote:

> > Prof Brian Ripley wrote:
> >>
> >> Currently read.table is rather limited in its type conversion.
> >> The algorithm is
> >>
> >> 0) Read as character
> >> 1) Try to convert to numeric. If that works, quit
> >> 2) Convert to factor unless !as.is.
> >>
> >> I am thinking about adding more flexibility and more classes by the
> >> following two changes.
> >>
> >> A) Anticipating the arrival of classes for all R objects, add an
> >> argument say `colClasses' that allows the user to specify the desired
> >> class for every column.  This could default to "auto", or NA if people
> >> think "auto" might be a relevant class name one day.
> >>
> >> The effect would be equivalent to running
> >>
> >> data[[i]] <- as(data[[i]], colClasses[i])
> >>
> >> instead of
> >>
> >> data[[i]] <- type.convert(data[[i]], as.is = as.is[i], dec = dec)
> >>
> >> except that standard classes such as "numeric", "factor", "logical",
> >> "character" would be dispatched directly, and argument "dec" would be
> >> consulted where appropriate.
> >>
> >> colClasses = "character" would suppress all conversions, which cannot
> >> currently be done.
>
> Just a small remark.  I would prefer `NA' to "auto" (or "unknown").  May
> be too late to change this now :-)

Anything can be changed up to 1.4.0 release.  In particular, the present
code will have to be changed unless as() is in base by then.

> I would also be happier if we did not refer to the variables explicitly
> as `columns'.  (This sounds a bit stupid from the person who wrote
> write.table and introduced arguments `row.names' and `col.names'.
> Although, at least one of these was modelled after an existing
> function).  E.g. something like
>
> 	read.table(......, caseNames, varNames, varClasses, .....)
>
> would be nice ...

The problem is that what is being referred to *is* columns and not
variables.  If you have row names on the file, the numbering is different.
So it matters to use sufficiently precise terminology.

Brian

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._