[Rd] read.table() with quoted integers

Milan Bouchet-Valat nalimilan at club.fr
Mon Sep 30 16:45:19 CEST 2013


Le lundi 30 septembre 2013 à 08:38 -0500, Joshua Ulrich a écrit :
> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> > Hi!
> >
> >
> > It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
> > quoted integers as an acceptable value for columns for which
> > colClasses="integer". But when colClasses is omitted, these columns are
> > read as integer anyway.
> >
> > For example, let's consider a file named file.dat, containing:
> > "1"
> > "2"
> >
> >> read.table("file.dat", colClasses="integer")
> > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
> >   scan() expected 'an integer' and got '"1"'
> >
> > But:
> >> str(read.table("file.dat"))
> > 'data.frame':   2 obs. of  1 variable:
> >  $ V1: int  1 2
> >
> > The latter result is indeed documented in ?read.table:
> >      Unless ‘colClasses’ is specified, all columns are read as
> >      character columns and then converted using ‘type.convert’ to
> >      logical, integer, numeric, complex or (depending on ‘as.is’)
> >      factor as appropriate.  Quotes are (by default) interpreted in all
> >      fields, so a column of values like ‘"42"’ will result in an
> >      integer column.
> >
> >
> > Should the former behavior be considered a bug?
> >
> No. If you tell read.table the column is integer and it's actually
> character on disk, it should be an error.
All values in a CSV file are stored as characters on disk, disregarding
the fact that they are surrounded by quotes or not. 1 is saved as
00110001 (ASCII character #49), not 00000001, nor 00000000 00000000
00000000 00000001 (as would for example imply a 32 bit storage of
integers).

So, with all due respect, please refrain from formulating such blatantly
erroneous statements.


Regards


> > This creates problems when combined with read.table.ffdf from package
> > ff, since this function tries to guess the column classes by reading the
> > first rows of the file, and then passes colClasses to read.table to read
> > the remaining rows by chunks. A column of quoted integers is correctly
> > detected as integer in the first read, but read.table() fails in
> > subsequent reads.
> >
> This sounds like a issue with read.table.ffdf.  The column of quoted
> integers is *incorrectly* detected as integer because they're actually
> character on disk.  read.table.ffdf should rely on how the data are
> actually stored on disk (via as.is=TRUE), not how read.table might
> convert them once they're read into R.
> 
> >
> > Regards
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> --
> Joshua Ulrich  |  about.me/joshuaulrich
> FOSS Trading  |  www.fosstrading.com



More information about the R-devel mailing list