[Rd] Inconsistency, may be bug in read.delim ?

Detlef Steuer steuer at hsu-hh.de
Mon Mar 19 14:23:32 CET 2018


Dear friends,

I stumbled into beheaviour of read.delim which I would consider a bug
or at least an inconsistency that should be improved upon.

Recently we had to work with data that used "", two double quotes, as
symbol to start and end character input.

Essentially the data looked like this

data.csv
========
V1, V2, V3
""data"", 3, """" 

The last sequence of """" indicating a missing.

One obvious solution to read in this data is using some gsub(),
but that's not the point I want to make.

Consider this case we found during tests:

test.csv
========
V1, V2, V3, V4
"""", """", 3, ""

and read it with 
> read.delim("test.csv", sep=",", header=TRUE, na.strings="\"")  

you get the following

  V1 V2 V3 V4
1 NA  "  3 NA  

(and a warning)

I would have assumed to get some error message or at
least the same result for both appearances of """" in the
input file.
(the setting na.strings="\"" turned out to be working for
 a colleague and his specific data, while I think it shouldn't)

My main concern is the different interpretation for the two """"
sequences.

Real bug? Minor inconsistency? I don't know.

All the best
Detlef


-- 
'People who say "I have nothing to hide" misunderstand the purpose of
surveillance. It was never about privacy. It's about power.' E. Snowden



More information about the R-devel mailing list