[Rd] read.table() with quoted integers

Henrik Bengtsson hb at biostat.ucsf.edu
Mon Sep 30 17:50:53 CEST 2013


On Mon, Sep 30, 2013 at 5:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> Hi!
>
>
> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
> quoted integers as an acceptable value for columns for which
> colClasses="integer". But when colClasses is omitted, these columns are
> read as integer anyway.
>
> For example, let's consider a file named file.dat, containing:
> "1"
> "2"
>
>> read.table("file.dat", colClasses="integer")
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
>   scan() expected 'an integer' and got '"1"'
>
> But:
>> str(read.table("file.dat"))
> 'data.frame':   2 obs. of  1 variable:
>  $ V1: int  1 2
>
> The latter result is indeed documented in ?read.table:
>      Unless ‘colClasses’ is specified, all columns are read as
>      character columns and then converted using ‘type.convert’ to
>      logical, integer, numeric, complex or (depending on ‘as.is’)
>      factor as appropriate.  Quotes are (by default) interpreted in all
>      fields, so a column of values like ‘"42"’ will result in an
>      integer column.
>
>
> Should the former behavior be considered a bug?
>
> This creates problems when combined with read.table.ffdf from package
> ff, since this function tries to guess the column classes by reading the
> first rows of the file, and then passes colClasses to read.table to read
> the remaining rows by chunks. A column of quoted integers is correctly
> detected as integer in the first read, but read.table() fails in
> subsequent reads.

The readDataFrame() of the R.filesets package provides argument
'trimQuotes' for this exact reason, i.e. for the purpose of trimming
quotes of columns for which 'colClasses' specifies a numeric type
before passing on to read.table().  Feel free to borrow from its
source code for a patch to ff:read.table.ffdf().  The workaround is in
readDataFrame() for TabularTextFile
[https://r-forge.r-project.org/scm/viewvc.php/pkg/R.filesets/R/TabularTextFile.R?view=markup&root=r-dots];
look for the part that starts with:

  # SPECIAL CASE/WORKAROUND: read.table()/scan() will give an error
  # if a numeric value is quoted and 'colClasses' specifies it as
  # a numeric value.  In order to read such values, we need to remove
  # the quotes first. /HB 2011-07-13

/Henrik
(author of R.filesets)

>
>
> Regards
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list