[R] retaining characters in a csv file
murdoch.duncan at gmail.com
Wed Sep 23 02:48:42 CEST 2015
On 22/09/2015 6:33 PM, Rolf Turner wrote:
> On 23/09/15 10:00, Therneau, Terry M., Ph.D. wrote:
>> I have a csv file from an automatic process (so this will happen
>> thousands of times), for which the first row is a vector of variable
>> names and the second row often starts something like this:
>> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .....
>> Notice the second variable which is
>> a character string (note the quotation marks)
>> a sequence of numeric digits
>> leading zeros are significant
>> The read.csv function insists on turning this into a numeric. Is there
>> any simple set of options that
>> will turn this behavior off? I'm looking for a way to tell it to "obey
>> the bloody quotes" -- I still want the first, third, etc columns to
>> become numeric. There can be more than one variable like this, and not
>> always in the second position.
>> This happens deep inside the httr library; there is an easy way for me
>> to add more options to the read.csv call but it is not so easy to
>> replace it with something else.
> IMHO this is a bug in read.csv().
No, it's a bug in "Rolf Turner", who believes in fairies at the end of
his garden, rather than in documentation for file formats.
> A possible workaround:
> ccc <- c("integer","character",rep(NA,k))
> X <- read.csv("melvin.csv",colClasses=ccc)
> where "melvin.csv" is the file from which you are attempting to read and
> where k+2 = the number of columns in that file.
> Kludgey, but it might work.
> Another workaround is to specify quote="", but this has the side effect
> of making the 5th column character rather than logical.
More information about the R-help