[R] How to read malformed csv files with read.table?

jim holtman jholtman at gmail.com
Fri Aug 22 17:02:22 CEST 2008


Try this.  It will read the file and see if there is a difference and
add in the extra headers:

x <- "       time/ms C550.KMS        Cyt_b559.KMS    Cyt_b563.KMS
Cyt_f_.KMS      P515FR.KMS      Scatt.KMS       Zea2.KMS        PC
 P700
0       Point1      -599.500           0.000           0.000
0.000           0.000          0.000           0.000           0.000
        0.000           0.000
0       Point2      -598.000          -0.012          -0.013
0.040           0.013          0.027           0.010           0.022
        0.000           0.000
0       Point3      -596.500          -0.015          -0.015
0.044           0.020          0.025           0.010           0.033
        0.000           0.000"
# find out how many dummy headers you have to add
x.c <- count.fields(textConnection(x))
x.diff <- x.c[2] - x.c[1]  # assume first line is short
x.connection <- textConnection(x)  # setup connection
if (x.diff > 0){
    # read first line
    x.first <- readLines(x.connection, n=1)
    # add dummy headers
    x.first <- paste(x.first, paste(LETTERS[1:x.diff], collapse=" "))
    pushBack(x.first, x.connection)   # push back the line so it is
ready for read.table
}

input <- read.table(x.connection, header=TRUE)
closeAllConnections()



On Fri, Aug 22, 2008 at 10:19 AM, Martin Ballaschk
<tmp082008 at ballaschk.com> wrote:
> Hi,
>
> how do I read files that have two header fields less than they have columns?
> The easiest solution would be to insert one or two additional header fields,
> but I have a lot of files and that would be quite a lot of awful work.
>
> Any ideas on how to solve that problem?
>
> #######
> R stuff:
>
>> read.table("myfile.CSV", sep = "\t", header = T)
>  Error in read.table("myfile.CSV", sep = "\t",  :
>  more columns than column names
>
>> count.fields("myfile.CSV", sep = "\t")
>   [1] 10 12 12 12 12 12 12 12 12 12 12 [...]
>
> #######
> ugly sample ("Exported by SDL DataTable component"):
>
>        time/ms C550.KMS        Cyt_b559.KMS    Cyt_b563.KMS    Cyt_f_.KMS
>    P515FR.KMS      Scatt.KMS       Zea2.KMS        PC      P700
> 0       Point1      -599.500           0.000           0.000           0.000
>           0.000          0.000           0.000           0.000
> 0.000           0.000
> 0       Point2      -598.000          -0.012          -0.013           0.040
>           0.013          0.027           0.010           0.022
> 0.000           0.000
> 0       Point3      -596.500          -0.015          -0.015           0.044
>           0.020          0.025           0.010           0.033
> 0.000           0.000
> [...]
>
>
> Cheers,
> Martin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list