[R] Errors in data frames from read.table

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Jul 16 19:23:55 CEST 2007


On Mon, 16 Jul 2007, Pat Carroll wrote:

> Hello, all.
>
> I am working on a project with a large (~350Mb, about 5800 rows) 
> insurance claims dataset. It was supplied in a tilde(~)-delimited 
> format. I imported it into a data frame in R by setting memory.limit to 
> maximum (4Gb) for my computer and using read.table.
>
> The resulting data frame had 10 bad rows. The errors appear due to 
> read.table missing delimiter characters, with multiple data being 
> imported into the same cell, then the remainder of the row and the next 
> run together and garbled due to the reading frame shift (example: a 
> single cell might contain: <datum>~ ~ <datum> ~<datum>, after which all 
> the cells of the row and the next are wrong).
>
> To replicate, I tried the same import procedure on a smaller 
> demographics data set from the same supplier- only about 1Mb, and got 
> the same kinds of errors (5 bad rows in about 3500). I also imported as 
> much of the file as Excel would hold and cross-checked, Excel did not 
> produce the same errors but can't handle the entire file. I have used 
> read.table on a number of other formats (mainly csv and tab-delimited) 
> without such problems; so far it appears there's something different 
> about these files that produces the errors but I can't see what it would 
> be.

The usual cause is that the user forgot about quotes and comment 
characters.  Try quote="", comment.char=""

If that does not work, please follow the request in the footer of every 
message on this list.

> Does anyone have any thoughts about what is going wrong? And is there a 
> way, short of manual correction, for fixing it?
>
> Thanks for all help,
> ~Pat.
>
>
> Pat Carroll.
> what matters most is how well you walk through the fire.
> bukowski.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list