[R] retaining characters in a csv file

Rolf Turner r.turner at auckland.ac.nz
Wed Sep 23 00:33:13 CEST 2015


On 23/09/15 10:00, Therneau, Terry M., Ph.D. wrote:
> I have a csv file from an automatic process (so this will happen
> thousands of times), for which the first row is a vector of variable
> names and the second row often starts something like this:
>
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .....
>
> Notice the second variable which is
>        a character string (note the quotation marks)
>        a sequence of numeric digits
>        leading zeros are significant
>
> The read.csv function insists on turning this into a numeric.  Is there
> any simple set of options that
> will turn this behavior off?  I'm looking for a way to tell it to "obey
> the bloody quotes" -- I still want the first, third, etc columns to
> become numeric.  There can be more than one variable like this, and not
> always in the second position.
>
> This happens deep inside the httr library; there is an easy way for me
> to add more options to the read.csv call but it is not so easy to
> replace it with something else.

IMHO this is a bug in read.csv().

A possible workaround:

ccc <- c("integer","character",rep(NA,k))
X   <- read.csv("melvin.csv",colClasses=ccc)

where "melvin.csv" is the file from which you are attempting to read and
where k+2 = the number of columns in that file.

Kludgey, but it might work.

Another workaround is to specify quote="", but this has the side effect
of making the 5th column character rather than logical.

cheers,

Rolf

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276



More information about the R-help mailing list