[R] Strange csv parsing problem

Peter Ehlers ehlers at ucalgary.ca
Thu Apr 8 16:30:10 CEST 2010


Hadley,

The cause of the count.fields result is the comma in 'nftc,%20'
at about column 300 (for me).

Since commas between quotes should normally not matter, this
must be due to the comma appearing inside escaped quotes, i.e.
we have: "abc\"def,ghi\"jkl".

Remove the comma and count.fields gives 11 for all rows.
 From your other post(s) on escaped quotes, I assume that
this won't solve your problem with the existing files. (:

Try this:
create a text file with the lines

"a,a"
"\"bc\""
"d\"e,f\"g"

count.fields(file, sep = ",").
[1] 1 1 2

  -Peter Ehlers

On 2010-04-07 19:26, Hadley Wickham wrote:
>> url<- "http://dl.dropbox.com/u/41902/22240.csv"
>>
>> read.csv(url)[, 1]
> [1] "oppose"  NA        "oppose"  "support"
>> read.csv(url, header = F)[, 1]
> [1] "url"
> [2] "http://maplight.org/us-congress/bill/109-hr-5825/387248"
> [3] "http://maplight.org/us-congress/bill/110-hr-3546/378743"
> [4] "http://maplight.org/us-congress/bill/111-s-908/365504"
> [5] "http://maplight.org/us-congress/bill/111-hr-3245/373358"
>>
>> count.fields(url, sep = ",")
> [1] 11 11 11 12 11
>
> This seems like it should be an error - I suspect it might be caused
> by the escaped quote (\") in line 4 column 432 causing the first
> column to be treated as column names:
>
>> read.csv(url, row.names = NULL)[, 1]
> [1] "http://maplight.org/us-congress/bill/109-hr-5825/387248"
> [2] "http://maplight.org/us-congress/bill/110-hr-3546/378743"
> [3] "http://maplight.org/us-congress/bill/111-s-908/365504"
> [4] "http://maplight.org/us-congress/bill/111-hr-3245/373358"
>
> Hadley
>

-- 
Peter Ehlers
University of Calgary



More information about the R-help mailing list