[R] retaining characters in a csv file

Duncan Murdoch murdoch.duncan at gmail.com
Wed Sep 23 02:48:42 CEST 2015


On 22/09/2015 6:33 PM, Rolf Turner wrote:
> On 23/09/15 10:00, Therneau, Terry M., Ph.D. wrote:
>> I have a csv file from an automatic process (so this will happen
>> thousands of times), for which the first row is a vector of variable
>> names and the second row often starts something like this:
>>
>> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .....
>>
>> Notice the second variable which is
>>        a character string (note the quotation marks)
>>        a sequence of numeric digits
>>        leading zeros are significant
>>
>> The read.csv function insists on turning this into a numeric.  Is there
>> any simple set of options that
>> will turn this behavior off?  I'm looking for a way to tell it to "obey
>> the bloody quotes" -- I still want the first, third, etc columns to
>> become numeric.  There can be more than one variable like this, and not
>> always in the second position.
>>
>> This happens deep inside the httr library; there is an easy way for me
>> to add more options to the read.csv call but it is not so easy to
>> replace it with something else.
> 
> IMHO this is a bug in read.csv().

No, it's a bug in "Rolf Turner", who believes in fairies at the end of
his garden, rather than in documentation for file formats.

Duncan Murdoch

> 
> A possible workaround:
> 
> ccc <- c("integer","character",rep(NA,k))
> X   <- read.csv("melvin.csv",colClasses=ccc)
> 
> where "melvin.csv" is the file from which you are attempting to read and
> where k+2 = the number of columns in that file.
> 
> Kludgey, but it might work.
> 
> Another workaround is to specify quote="", but this has the side effect
> of making the 5th column character rather than logical.
> 
> cheers,
> 
> Rolf
>



More information about the R-help mailing list