[R] The behaviour of read.csv().

Fri Dec 3 01:08:55 CET 2010

Rolf -
    I'd suggest using

     junk <- read.csv("junk.csv",header=TRUE,fill=FALSE)

if you don't want the behaviour you're seeing.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu

On Fri, 3 Dec 2010, Rolf Turner wrote:

>
> I have recently been bitten by an aspect of the behaviour of
> the read.csv() function.
>
> Some lines in a (fairly large) *.csv file that I read in had
> too many entries.  I would have hoped that this would cause
> read.csv() to throw an error, or at least issue a warning,
> but it read the file without complaint, putting the extra
> entries into an additional line.
>
> This behaviour is illustrated by the toy example in the
> attached file ``junk.csv''.  Just do
>
> 	junk <- read.csv("junk.csv",header=TRUE)
> 	junk
>
> to see the problem.
>
> If the offending over-long line were in the fourth line of data
> or earlier, an error would be thrown, but if it is in the fifth line
> of data or later no error is given.
>
> This is in a way compatible with what the help on read.csv()
> says:
>
> 	The number of data columns is determined by looking at
> 	the first five lines of input (or the whole file if it
> 	has less than five lines), or from the length of col.names
> 	if it is specified and is longer.
>
> However, the help for read.table() says the same thing.  And yet if
> one does
>
> 	gorp <- read.table("junk.csv",sep=",",header=TRUE)
>
> one gets an error, whereas read.csv() gives none.
>
> Am I correct in saying that is inappropriate behaviour on
> the part of read.csv(), or am I missing something?
>
> 		cheers,
>
> 			Rolf Turner
>
>