[R] read.csv quotes within fields

John Kane jrkrideau at inbox.com
Sat Jan 26 15:23:49 CET 2013


Following David's suggestion you might want to have a look at https://confluence.clazzes.org/display/CSVEDIT/CSVEdit+Home .

 I have not used it but it seems to get good reviews from people I know.

John Kane
Kingston ON Canada


> -----Original Message-----
> From: dwinsemius at comcast.net
> Sent: Fri, 25 Jan 2013 13:42:25 -0800
> To: tghoward at gw.dec.state.ny.us
> Subject: Re: [R] read.csv quotes within fields
> 
> 
> On Jan 25, 2013, at 1:37 PM, Tim Howard wrote:
> 
>> David,
>> Thank you again for the reply. I'll try to make readLines() and
>> strplit() work.  What bugs me is that I think it would import fine if
>> the folks who created the csv had used double quotes "" rather than an
>> escaped quote \" for those pesky internal quotes. Since that's the case,
>> I'd think there would be a solution within read.csv() ... or perhaps
>> scan()?, I just can't figure it out.
> 
> Can you pre-process with an editor? Replace all the ", " hits with
> something like '|'.
> 
> --
> David.
>> best,
>> Tim
>> 
>>>>> David Winsemius <dwinsemius at comcast.net> 1/25/2013 4:16 PM >>>
>> 
>> On Jan 25, 2013, at 11:35 AM, Tim Howard wrote:
>> 
>>> Great point, your fix (quote="") works for the example I gave.
>>> Unfortunately, these text strings have commas in them as well(!).
>>> Throw a few commas in any of the text strings and it breaks again.
>>> Sorry about not including those in the example.
>>> 
>>> So, I need to incorporate commas *and* quotes with the escape character
>>> within a single string.
>> 
>> Well you need to have _some_ delimiter. At the moment it sounds as
>> though you might end upusing readLines() and strsplit( . ,
>> split="\\'\\,\\s\\").
>> 
>> --
>> david.
>> 
>>> 
>>> Tim
>>> 
>>> 
>>>>>> David Winsemius <dwinsemius at comcast.net> 1/25/2013 2:27 PM >>>
>>> 
>>> On Jan 25, 2013, at 10:42 AM, Tim Howard wrote:
>>> 
>>>> All,
>>>> 
>>>> I have some csv files I am trying to import. I am finding that quotes
>>>> inside strings are escaped in a way R doesn't expect for csv files.
>>>> The problem only seems to rear its ugly head when there are an uneven
>>>> number of internal quotes. I'll try to recreate the problem:
>>>> 
>>>> # set up a matrix, using escape-quote as the internal double quote
>>>> mark.
>>>> 
>>>> x <- data.frame(matrix(data=c("1", "string one", "another string",
>>>> "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string",
>>>> "3","third row","last \" col"),ncol = 3, byrow=TRUE))
>>>> 
>>>>> write.csv(x, "test.csv")
>>>> 
>>>> # NOTE that write.csv correctly created the three internal quotes ' "
>>>> ' by using double quotes ' "" '.
>>>> # here's what got written
>>>> 
>>>> "","X1","X2","X3"
>>>> "1","1","string one","another string"
>>>> "2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string"
>>>> "3","3","third row","last "" col"
>>>> 
>>>> # Importing test.csv works fine.
>>>> 
>>>>> read.csv("test.csv")
>>>>  X X1                                         X2             X3
>>>> 1 1  1                                 string one another string
>>>> 2 2  2 quotes escaped 10' 20" 5' 30" "test string   final string
>>>> 3 3  3                                  third row     last " col
>>>> # this looks good.
>>>> # now, please go and open "test.csv" with a text editor and replace
>>>> all the double quotes '""' with the
>>>> # quote escaped ' \" ' as is found in my data set. Like this:
>>>> 
>>>> "","X1","X2","X3"
>>>> "1","1","string one","another string"
>>>> "2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string"
>>>> "3","3","third row","last \" col"
>>> 
>>> Use quote="":
>>> 
>>>> read.csv(text='"","X1","X2","X3"
>>> + "1","1","string one","another string"
>>> + "2","2","quotes escaped 10\' 20"" 5\' 30"" ""test string","final
>>> string"
>>> + "3","3","third row","last "" col"', sep=",", quote="")
>>> 
>>> Not ...., quote="\""
>>> 
>>> 
>>>   X.. X.X1.                                           X.X2.
>>> X.X3.
>>> 1 "1"   "1"                                    "string one" "another
>>> string"
>>> 2 "2"   "2" "quotes escaped 10' 20"" 5' 30"" ""test string"   "final
>>> string"
>>> 3 "3"   "3"                                     "third row"    "last ""
>>> col"
>>> 
>>> You will then be depending entirely on commas to separate.
>>> 
>>> (Needed to use escaped single quotes to illustrate from a command
>>> line.)
>>> 
>>>> 
>>>> # this breaks read.csv:
>>>> 
>>>>> read.csv("test.csv")
>>>>  X X1
>>>> X2             X3
>>>> 1 1  1
>>>> string one another string
>>>> 2 2  2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test )
>>>> string,final string\n3,3,third row,last \\ col
>>>> 
>>>> # we now have only two rows, with all the data captured in col2 row2
>>>> 
>>>> Any suggestions on how to fix this behavior? I've tried fiddling with
>>>> quote="\"" to no avail, obviously. Interestingly, an even number of
>>>> escaped quotes within a field is loaded correctly, which certainly
>>>> threw me for a while!
>>>> 
>>>> Thank you in advance,
>>>> Tim
>>>> 
>>>> 
>>> 
>>> David Winsemius
>>> Alameda, CA, USA
>>> 
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 
> 
> David Winsemius
> Alameda, CA, USA
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!



More information about the R-help mailing list