[R] read.table() Issue

William Dunlap wdunlap at tibco.com
Wed Aug 1 20:23:13 CEST 2012


An unmatched quote can make read.table run very slowly
when there are lots of lines in the file.  E.g.,
> z <- rep("A B C", 10^6)
> z[2] <- "A \"B C" # unmatched quote on line 2
> tf <- tempfile()
> cat(file=tf, sep="\n", z)
> system.time(z2 <- read.table(tf, skip=2)) # skip bad line
   user  system elapsed
  0.860   0.028   0.887
> str(z2)
'data.frame':   999998 obs. of  3 variables:
 $ V1: Factor w/ 1 level "A": 1 1 1 1 1 1 1 1 1 1 ...
 $ V2: Factor w/ 1 level "B": 1 1 1 1 1 1 1 1 1 1 ...
 $ V3: Factor w/ 1 level "C": 1 1 1 1 1 1 1 1 1 1 ...
> system.time(z1 <- read.table(tf, skip=1))
[ no return for several minutes on a 64-bit Linux machine ]

On smaller files it quickly gives the error "line 1 did not have 4 elements",
along with a warning "incomplete final line found by readTableHeader ...".

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Rich Shepard
> Sent: Wednesday, August 01, 2012 10:52 AM
> To: r-help at r-project.org
> Subject: [R] read.table() Issue
> 
>    Yesterday I changed the headers for a couple of columns in data text files
> and removed hyphens from within character strings, too. When I tried to
> re-read these data sources using read.table() I encountered an issue I've
> not before seen. Both files were read almost instantly until yesterday's
> wording changes.
> 
>    Now both files seem to cause R to hang. Rather than having the prompt
> immediately returned nothing happens. In emacs the 'working' symbol appears
> but the read.table() function does not complete.
> 
>    What might cause this?
> 
> Rich
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list