[R] retaining characters in a csv file

Therneau, Terry M., Ph.D. therneau at mayo.edu
Wed Sep 23 14:57:53 CEST 2015


Thanks for all for the comments, I hadn't intended to start a war.

My summary:
   1. Most important: I wasn't missing something obvious.  This is always my first 
suspicion when I submit something to R-help, and it's true more often than not.

   2. Obviously (at least it is now), the CSV standard does not specify that quotes should 
force a character result.  R is not "wrong".  Wrt to using what Excel does as litmus test, 
I consider that to be totally uninformative about standards: neither pro (like Duncan) or 
anti (like Rolf), but simply irrelevant.  (Like many MS choices.)

   3. I'll have to code in my own solution, either pre-scan the first few lines to create 
a colClasses, or use read_csv from the readr library (if there are leading zeros it keeps 
the string as character, which may suffice for my needs), or something else.

   4. The source of the data is a "text/csv" field coming from an http POST request.  This 
is an internal service on an internal Mayo server and coded by our own IT department; this 
will not be the first case where I have found that their definition of "csv" is not quite 
standard.

Terry T.



> On 23/09/15 10:00, Therneau, Terry M., Ph.D. wrote:
>> I have a csv file from an automatic process (so this will happen
>> thousands of times), for which the first row is a vector of variable
>> names and the second row often starts something like this:
>>
>> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .....
>>
>> Notice the second variable which is
>>        a character string (note the quotation marks)
>>        a sequence of numeric digits
>>        leading zeros are significant
>>
>> The read.csv function insists on turning this into a numeric.  Is there
>> any simple set of options that
>> will turn this behavior off?  I'm looking for a way to tell it to "obey
>> the bloody quotes" -- I still want the first, third, etc columns to
>> become numeric.  There can be more than one variable like this, and not
>> always in the second position.
>>
>> This happens deep inside the httr library; there is an easy way for me
>> to add more options to the read.csv call but it is not so easy to
>> replace it with something else.



More information about the R-help mailing list