[R] why must a named colClasses in read.table be in correct order

Henrik Bengtsson henrik.bengtsson at ucsf.edu
Thu Jul 9 04:54:01 CEST 2015


Thanks for insisting; I was wrong and I'm happy to see that there is
indeed code intended for named 'colClasses', which even goes back to
2004.   But as you report, then names only work when
length(colClasses) < cols (which also explains why I though it was not
supported).  I'm not sure if that _strictly less than_  test is
intentional or a mistake, but I would propose the following patch:

[HB-X201]{hb}: svn diff src\library\utils\R\readtable.R
Index: src/library/utils/R/readtable.R
===================================================================
--- src/library/utils/R/readtable.R     (revision 68642)
+++ src/library/utils/R/readtable.R     (working copy)
@@ -139,7 +139,7 @@
     if (rlabp) col.names <- c("row.names", col.names)

     nmColClasses <- names(colClasses)
-    if(length(colClasses) < cols)
+    if(length(colClasses) <= cols)
         if(is.null(nmColClasses)) {
             colClasses <- rep_len(colClasses, cols)
         } else {


Your example works with this patch.  I've made it source():able so you
can try it out (if you cannot source() https://, then download the
file an source it locally):

source("https://gist.githubusercontent.com/HenrikBengtsson/ed1eeb41a1b4d6c43b47/raw/ebe58f76e518dd014423bea466a5c93d2efd3c99/readtable-fix.R")

kkk <- c("a\tb",
         "3.14\tx")

colClasses <- c(a="numeric", b="character")
data <- read.table(textConnection(kkk),
                   sep="\t",
                   header = TRUE,
                   colClasses = colClasses)
str(data)
### 'data.frame':   1 obs. of  2 variables:
### $ a: num 3.14
### $ b: chr "x"

## Does not work with utils::read.table(), but with patch
data <- read.table(textConnection(kkk),
                   sep="\t",
                   header = TRUE,
                   colClasses = rev(colClasses))
str(data)
### 'data.frame':   1 obs. of  2 variables:
### $ a: num 3.14
### $ b: chr "x"

Let's hope that the above is a (10-year old) typo, and changing a < to
a <= adds support for named 'colClasses', which is a really useful
functionality.

/Henrik

On Wed, Jul 8, 2015 at 6:42 PM, Andreas Leha
<andreas.leha at med.uni-goettingen.de> wrote:
> Hi Henrik,
>
> Thanks for your reply.
>
> I am not (yet) convinced, though.  The help page for read.table
> mentions named colClasses and if I specify colClasses for not all
> columns, the names are taken into account:
>
> --8<---------------cut here---------------start------------->8---
> kkk <- c("a\tb",
>          "3.14\tx")
> str(read.table(textConnection(kkk),
>            sep="\t",
>                header = TRUE))
>
> str(read.table(textConnection(kkk),
>                sep="\t",
>                header = TRUE,
>                colClasses=c(b="character")))
> --8<---------------cut here---------------end--------------->8---
>
> What am I missing?
>
> Best,
> Andreas
>
>
>
> On 09/07/2015 02:21, Henrik Bengtsson wrote:
>> read.table() does not make use of names(colClasses) - only its values.
>> Because of this, ordering is critical, as you noted. It shouldn't be
>> too hard to add support for a named `colClasses` argument of
>> utils::read.table(), but someone needs to convince the R core team
>> that this is a good idea.
>>
>> As an alternative, see R.filesets::readDataFrame() for a
>> read.table()-like function that matches names(colClasses) to column
>> names, if they exists.
>>
>> /Henrik
>> (author of R.filesets)
>>
>> On Wed, Jul 8, 2015 at 5:41 PM, Andreas Leha
>> <andreas.leha at med.uni-goettingen.de> wrote:
>>> Hi all,
>>>
>>> Apparently, the colClasses argument to read.table needs to be in the
>>> order of the columns *even when it is named*.  Why is that?  And where
>>> would I find it in the documentation?
>>>
>>> Here is a MWE:
>>>
>>> --8<---------------cut here---------------start------------->8---
>>> kkk <- c("a\tb",
>>>          "3.14\tx")
>>> read.table(textConnection(kkk),
>>>            sep="\t",
>>>            header = TRUE)
>>>
>>> cclasses=c(b="character",
>>>            a="numeric")
>>>
>>> read.table(textConnection(kkk),
>>>            sep="\t",
>>>            header = TRUE,
>>>            colClasses = cclasses)              ## <--- error
>>>
>>> read.table(textConnection(kkk),
>>>            sep="\t",
>>>            header = TRUE,
>>>            colClasses = cclasses[order(names(cclasses))])
>>> --8<---------------cut here---------------end--------------->8---
>>>
>>>
>>> Thanks,
>>> Andreas
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list