[R] reading csv files

Jim Lemon jim at bitwrit.com.au
Fri Feb 5 23:29:50 CET 2010


On 02/06/2010 12:57 AM, Barry Rowlingson wrote:
> On Fri, Feb 5, 2010 at 10:23 AM, analyst41 at hotmail.com
> <analyst41 at hotmail.com>  wrote:
>> the csv files are downloaded from a database and it looks like some
>> character fields contain the CR-LF sequence within them.
>>
>> This causes R to see a new record/row and the number of rows it sees
>> is different (usually higher) from the number of rows actually
>> extracted.
>
>   Hard to tell without an example, but I just tried this in a file:
>
> 1,2,"this
> is a test",99
> 2,3,"oneliner",45
>
> and:
>
>> read.table("test.csv",sep=",")
>    V1 V2              V3 V4
> 1  1  2 this\nis a test 99
> 2  2  3        oneliner 45
>
> seemed to work. But if your strings aren't "quoted" (hard to tell
> without an example) then you might have to find another way. Hard to
> tell without an example.
>
Maybe the database output looks like this:

1,2,this
is a test,99
2,3,oneliner,45

in which case:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings,  :
   line 1 did not have 4 elements

However, if we try:

read.csv("test.csv",header=FALSE)
          V1 V2       V3 V4
1         1  2     this NA
2 is a test 99          NA
3         2  3 oneliner 45

If you can determine whether the embedded EOLs are different from those 
at the end of a record, you could do a global replace on the input file 
for the embedded EOLs to some character that isn't used (e.g. ~ or |) in 
the input file. I'll leave the syntax to the regexperts.

Jim



More information about the R-help mailing list