[R] Error with read.delim & read.csv

Peter Waltman waltman at cs.nyu.edu
Thu Nov 15 18:11:56 CET 2007


Hi -

I'm reading in a tab delimited file that is causing issues with 
read.delim.  Specifically, for a specific set of lines, the last entry 
of the line is misread and considered to be the first entry of a new row 
(which is then padded with 'NA's' ).  Specifically:

    tmp <- read.delim( "trouble.txt", header=F )

produces a data.frame, tmp where if I call tmp[,1], I get output like:

     [76] F45H7.4#2     C47C12.5#2    F40H7.4#2     ZK353.2      
    0.59        
     [81] Y116A8C.34    0.23          Y116F11A.MM   0.04         
    F26D12.A    

I initially assumed it was a formatting issue with the file.  However, 
I've tried looking at the file in octal viewer, and the lines in 
question seem fine.  Additionally, using scan and then strsplit can 
split the lines correctly (code below the sig).

Since I can't attach the file to a group posting, I can't give a sample 
of the lines causing the issue, however, I can send a small sample to 
anyone who's interested.

Note, I've tried this on several architectures and versions of R and get 
the same behavior.  Specifically, v.2.5.1 on an x86_64, as well as 
v.2.6.0 on an x686 architecture.  I also get similar behavior when I 
convert the file into a comma-separated file and use read.csv.

As a quick workaround I can use scan & strsplit, but thought someone 
might want to take a look at this problem.

Thanks,

Peter Waltman


p.s. the combination of scan & strsplit I describe above was as follows:

    my.lines <- scan( "trouble.txt", sep="\n", what='character' )
    split.lines <- strsplit( my.lines, "\t" )
    num.entries <- sapply( split.lines, length )

after which num.lines will contain a equal number of entries as 
my.lines, all containing 509 (the number of elt's per line).



More information about the R-help mailing list