[R] Unexpected behaviour of write.csv - read.csv

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jan 13 19:30:42 CET 2011


On Thu, Jan 13, 2011 at 1:06 PM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
> On Thu, 13 Jan 2011, Duncan Murdoch wrote:
>
>> On 11-01-13 6:26 AM, Rainer M Krug wrote:
>>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi
>>>
>>> Assuming the following:
>>>
>>>> x<- data.frame(a=1:10, b=runif(10))
>>>> str(x)
>>>
>>> 'data.frame':   10 obs. of  2 variables:
>>>  $ a: int  1 2 3 4 5 6 7 8 9 10
>>>  $ b: num  0.692 0.325 0.634 0.16 0.873 ...
>>>>
>>>> write.csv(x, "x.csv")
>>>> x2<- read.csv("x.csv")
>>>> str(x2)
>>>
>>> 'data.frame':   10 obs. of  3 variables:
>>>  $ X: int  1 2 3 4 5 6 7 8 9 10
>>>  $ a: int  1 2 3 4 5 6 7 8 9 10
>>>  $ b: num  0.692 0.325 0.634 0.16 0.873 ...
>>>>
>>>
>>> Using the two functions write.csv and read.csv, I would assume, that the
>>> resulting data.frame x2 be identical with x, but it has an additional
>>> column X, which contains the row names of x.
>>>
>>> I know read.table and write.table which work as expected, but I would
>>> like to use a csv for data exchange reasons.
>>>
>>> I know that I can use
>>> write.csv(x, "x.csv", row.names=FALSE)
>>>
>>> and it would work, but shouldn't that be the default behaviour?
>>
>> I don't think so.  The CSV format is an export format which holds less
>> information than a dataframe.  By exporting the dataframe to CSV and
>> importing the result, you are discarding information and you should expect
>> to get something different.
>
> You need to read it with read.csv("x.csv", row.names=1)
>
> Nothing in the csv format lets R know that the first column is the row names
> (in the format used by read.table, having a header that is one column short
> does).  Now R could guess that a .csv file with an empty string for the
> first column name is meant to be the row names, but that would be merely a
> guess based on one (barely documented for spreadsheets) convention.

read.csv / read.table already use heuristics to determine the column
types so adding this to the heuristic seems not to be a departure from
the established philosophy.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list