[R] Excel

jiho jo.irisson at gmail.com
Fri Aug 31 08:35:00 CEST 2007


On 2007-August-31  , at 00:13 , David Scott wrote:
> On Thu, 30 Aug 2007, Duncan Murdoch wrote:
>> On 8/28/2007 3:16 AM, J Dougherty wrote:
>>> On Monday 27 August 2007 22:21, David Scott wrote:
>>>> On Tue, 28 Aug 2007, Robert A LaBudde wrote:
>>>>> If you format the column as "Text", you won't have this  
>>>>> problem. By
>>>>> leaving the cells as "General", you leave it up to Excel to  
>>>>> guess at
>>>>> the correct interpretation.
>>>>
>>>> Not true actually. I had converted the column to Text because I  
>>>> saw the
>>>> interpretation as a date in the .xls file. I saved the .csv file  
>>>> *after*
>>>> the column had been converted to Text. Looking at the .csv file  
>>>> in a text
>>>> editor, the entry is correct.
>>>>
>>>> I have just rechecked this.
>>>>
>>>> On reopening the .csv using Excel, the entry AUG2699 had been  
>>>> interpreted
>>>> as a date, and was showing as Aug-99. Most bizarre is that the  
>>>> NHI value
>>>> of AUG1838 has *not* been interpreted as a date.
>>>>
>>> Actually, in Excel 2000, he's right.  What you have to is be sure  
>>> of is that
>>> the "'" that denotes a text entry precedes EVERY entry that can  
>>> be confused
>>> with a date.  Selecting the entire column and setting the format  
>>> to "text"
>>> *before* data is entered does this.  It will also create an  
>>> appropriate *.csv
>>> file.  Excel is notable too because it will automatically convert  
>>> "date-like"
>>> entries as you type.  In a column of IDs or similar critical  
>>> data, that
>>> behaviour is really bad.  I have never tried the MS site, but I  
>>> haven't been
>>> able to find any entry about how to turn that particular  
>>> automatic behaviour
>>> off.
>>>
>>> However, while I have not experimented extensively, as far as I have
>>> experimented, OpenOffice spreadsheet does not behave this way.
>>
>> I don't use Excel, but in OpenOffice 2.2.1 the ' is lost when a  
>> file is
>> saved as .csv and reloaded.  So if I take care and enter
>>
>> 'November 15
>>
>> in a cell, then save it, OO will change it to 11/15/2007 when I  
>> reload.
>>  I can override this change by manually changing "Standard" format to
>> "Text" *every time* I load the file.  There's a help index entry  
>> "date
>> formats;avoiding conversion to", but it offers no more help than  
>> "add an
>> apostrophe at the beginning of the entry".
>>
>> This is brain-dead behaviour.
>
> This was the behaviour that really scared me in Excel: saving as .csv
> loses any formatting (it is just an ascii file, how can it have  
> formatting
> info?). Then opening in Excel (or it seems OO), the incorrect date
> interpretation occurs. If I then save the .csv I have erroneous data.
>
> I often do just this sort of thing because I get given data  
> in .xls, it
> has clunky column names or extraneous stuff so I alter it, save it as
> .csv. Then I get a data correction, some clarification of a value,  
> so I
> want to go to the .csv to correct that data value. Once I do that  
> if I am
> not *extremely* careful, before saving the .csv file, I have a  
> problem.

I'll probably advise everyone to use Gnumeric then:
- entries such as 2005/06/08 are interpreted as date and show as  
8/6/2005. but even if you change them to 8/7/05 for example they will  
be written in the csv in your original format, with the change  
included (i.e. 2005/07/08 here)
- entries with several decimals such as 1.4563 can be formatted to be  
displayed 1.46 but will still be written 1.4563 in the csv
- there is no text import/export dialog when opening or closing csv  
files which speeds up things quite a bit. but you can get the dialog  
if you are so inclined

Still some problems
- "0568" in the csv, which is a label (notice the quotes and leading  
zero) is still interpreted as a number by default
- the date is in fact written using the default preferences (namely  
yyyy/mm/dd) and some date in ISO format (yyyy-mm-dd) is converted to  
yyyy/mm/dd when written in csv

So not perfect but much better (and quicker and possibly more  
precise) than both Excel and OO Calc. Oh and cross platform also ;).

JiHO
---
http://jo.irisson.free.fr/



More information about the R-help mailing list