[R] The behaviour of read.csv().

David Scott d.scott at auckland.ac.nz
Fri Dec 3 03:48:56 CET 2010


  On 03/12/10 14:33, Duncan Murdoch wrote:
> On 02/12/2010 8:04 PM, Peter Ehlers wrote:
>> On 2010-12-02 16:26, Rolf Turner wrote:
>>> On 3/12/2010, at 1:08 PM, Phil Spector wrote:
>>>
>>>> Rolf -
>>>>       I'd suggest using
>>>>
>>>>        junk<- read.csv("junk.csv",header=TRUE,fill=FALSE)
>>>>
>>>> if you don't want the behaviour you're seeing.
>>>
>>> The point is not that I don't want this kind of behaviour.
>>> The point is that it seems to me to be unexpected and dangerous.
>>>
>>> I can indeed take precautions against it, now that I know about it,
>>> by specifying fill=FALSE.  Given that I remember to do so.
>>>
>>> Now that you've pointed it out I can see that this is the reason
>>> for the different behaviour between read.table() and read.csv();
>>> in read.table() fill=FALSE is effectively the default.
>>>
>>> Having fill=TRUE being the default in read.csv() strikes me as
>>> being counter-intuitive and dangerous.
>>>
>> Rolf,
>> This is not to argue with your point re counter-intuitive,
>> but I always run a count.fields() first if I haven't seen
>> (or can't easily see) the file in my editor. I must have
>> learned that the hard way a long time ago.
> I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0.
> The comment in the NEWS file suggests it was in response to some strange
> csv file coming out of Excel.
>
> The real problem with the CSV format is that there really isn't a well
> defined standard for it.  The first RFC about it was published in 2005,
> and it doesn't claim to be authoritative.  Excel is kind of a standard,
> but it does some very weird things.  (For example:  enter the string 01
> into a field.  To keep the leading 0, you need to type it as '01.  Save
> the file, read it back:  goodbye 0.  At least that's what a website I
> was just on says about Excel, and what OpenOffice does.)
>
> I've been burned so many times by storing data in .csv files, that I
> just avoid them whenever I can.
Absolutely agree with this Duncan. Playing around with .csv files is 
like playing with some sort of unstable explosive. I also avoid them as 
much as possible.

David Scott


> Duncan Murdoch
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
_________________________________________________________________
David Scott	Department of Statistics
		The University of Auckland, PB 92019
		Auckland 1142,    NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:	d.scott at auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics



More information about the R-help mailing list