[R] scan() problem

Paul Bayer Paul.Bayer at gleichsam.de
Wed Sep 10 19:27:08 CEST 2003


Dear R-helpers,

I have to read some large csv-files into R (30 - 100MB).
Since reading with read.csv leads to "memory exhausted", I tried
with scan(), skipping not needed columns by NULL-elements in
"what".

When these skipped elements are quoted strings with commata inside,
R interprets each such quoted comma as element separator
leading to wrong records in the rest of the line.

A little test will show what I mean. I have the following "test.csv":

"col.A","col.B","col.C","col.D"
1,"quoted string","again, again again",123
2,"nice quotes, isnt it","you got it",456

First I read all elements:

 > tst <- scan("test.csv", what=list(a=0,b="",c="",d=0), sep=",", skip=1)
Read 2 records
 > tst
$a
[1] 1 2

$b
[1] "quoted string"        "nice quotes, isnt it"

$c
[1] "again, again again" "you got it"

$d
[1] 123 456

Everything is fine. Then I try to skip the 2nd column by giving b=NULL:

 > tst <- scan("test.csv", what=list(a=0,b=NULL,c="",d=0), sep=",", 
skip=1)
Read 2 records
Warning message:
number of items read is not a multiple of the number of columns
 > tst
$a
[1] 1 2

$b
NULL

$c
[1] "again, again again"            " isnt it,you got it,456\n\n\n"

$d
[1] 123  NA

 >

I got garbage.
Isn't this a bug?
Or did I something wrong?
Is there a workaround?

Thank you all,

Paul Bayer,
Feldafing, Germany




More information about the R-help mailing list