[R] read.csv quotes within fields

Tim Howard tghoward at gw.dec.state.ny.us
Fri Jan 25 22:37:02 CET 2013


David, 
Thank you again for the reply. I'll try to make readLines() and strplit() work.  What bugs me is that I think it would import fine if the folks who created the csv had used double quotes "" rather than an escaped quote \" for those pesky internal quotes. Since that's the case, I'd think there would be a solution within read.csv() ... or perhaps scan()?, I just can't figure it out. 
best, 
Tim

>>> David Winsemius <dwinsemius at comcast.net> 1/25/2013 4:16 PM >>>

On Jan 25, 2013, at 11:35 AM, Tim Howard wrote:

> Great point, your fix (quote="") works for the example I gave. Unfortunately, these text strings have commas in them as well(!).  Throw a few commas in any of the text strings and it breaks again.  Sorry about not including those in the example.
>  
> So, I need to incorporate commas *and* quotes with the escape character within a single string.

Well you need to have _some_ delimiter. At the moment it sounds as though you might end upusing readLines() and strsplit( . , split="\\'\\,\\s\\").

-- 
david.

>  
> Tim
>  
> 
> >>> David Winsemius <dwinsemius at comcast.net> 1/25/2013 2:27 PM >>>
> 
> On Jan 25, 2013, at 10:42 AM, Tim Howard wrote:
> 
> > All,
> > 
> > I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem:
> > 
> > # set up a matrix, using escape-quote as the internal double quote mark.
> > 
> > x <- data.frame(matrix(data=c("1", "string one", "another string", "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string", "3","third row","last \" col"),ncol = 3, byrow=TRUE))
> > 
> >> write.csv(x, "test.csv")
> > 
> > # NOTE that write.csv correctly created the three internal quotes ' " ' by using double quotes ' "" '. 
> > # here's what got written
> > 
> > "","X1","X2","X3"
> > "1","1","string one","another string"
> > "2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string"
> > "3","3","third row","last "" col"
> > 
> > # Importing test.csv works fine.
> > 
> >> read.csv("test.csv")
> >  X X1                                         X2             X3
> > 1 1  1                                 string one another string
> > 2 2  2 quotes escaped 10' 20" 5' 30" "test string   final string
> > 3 3  3                                  third row     last " col
> > # this looks good. 
> > # now, please go and open "test.csv" with a text editor and replace all the double quotes '""' with the 
> > # quote escaped ' \" ' as is found in my data set. Like this:
> > 
> > "","X1","X2","X3"
> > "1","1","string one","another string"
> > "2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string"
> > "3","3","third row","last \" col"
> 
> Use quote="":
> 
> > read.csv(text='"","X1","X2","X3"
> + "1","1","string one","another string"
> + "2","2","quotes escaped 10\' 20"" 5\' 30"" ""test string","final string"
> + "3","3","third row","last "" col"', sep=",", quote="")
> 
> Not ...., quote="\""
> 
> 
>   X.. X.X1.                                           X.X2.            X.X3.
> 1 "1"   "1"                                    "string one" "another string"
> 2 "2"   "2" "quotes escaped 10' 20"" 5' 30"" ""test string"   "final string"
> 3 "3"   "3"                                     "third row"    "last "" col"
> 
> You will then be depending entirely on commas to separate. 
> 
> (Needed to use escaped single quotes to illustrate from a command line.)
> 
> > 
> > # this breaks read.csv:
> > 
> >> read.csv("test.csv")
> >  X X1                                                                                    X2             X3
> > 1 1  1                                                                            string one another string
> > 2 2  2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col      
> > 
> > # we now have only two rows, with all the data captured in col2 row2
> > 
> > Any suggestions on how to fix this behavior? I've tried fiddling with quote="\"" to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while!
> > 
> > Thank you in advance, 
> > Tim
> > 
> > 
> 
> David Winsemius
> Alameda, CA, USA
> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list