[R] read a xls file

Ian Jenkinson ian.jenkinson at wanadoo.fr
Tue Jan 20 17:21:34 CET 2009


On 2009-01-20 15:24, Gavin Simpson wrote:
> On Tue, 2009-01-20 at 14:45 +0100, Ian Jenkinson wrote:
>   
>>> See the R Import/Export manual.  Also
>>> RSiteSearch("import excel")
>>> gives many hits.  It seems as if this question is
>>> asked almost daily.
>>>
>>> On Sun, Jan 18, 2009 at 9:15 AM, Michele Santacatterina
>>> <miksanta at gmail.com> wrote:
>>>   
>>>       
>>>>> Hello,
>>>>>
>>>>> i have a xls file. I will read it in r, what library-command i use for
>>>>> this??
>>>>>
>>>>> any ideas??
>>>>>           
>> I feel concerned because I have just spent a frustrating couple of days 
>> trying to read an Excel (xls) file, with the aid of the R book (Crawley, 
>> 2007), and R help files. I failed, but finally found a workaround. My 
>> experience might help others.
>>     
>
> You did read ?read.table yes?
>   
Unfortunately not. Now I see that it is rich and detailed. Thank you.
> There are three arguments there that can help in such situations: 
>
> 'colClasses' allows the finest grained control over how R imports your
> text files. You specify what each column is, noting that if you have
> lots of columns, things like 
>
> c("numeric", rep("character", 12))
>
> will deal with runs of columns of the same type, without having to type
> them all by hand.
>   
I'm a bit mystified by this.
> 'as.is' is a vector of logicals (TRUE/FALSE) that controls whether a
> column is read in as is or converted.
>   
I have just tried your suggestion:

 > 
TTT<-read.table("/D/.../090117T_P.txt",header=TRUE,as.is="Log.microplank.biomass")

I CONFIRM IT WORKS! Sure enough elements in my "Log.microplank.biomass" 
column are now "numeric":

 > (TTT[5,7])
[1] 1.612784
 > class(TTT[5,7])
[1] "numeric"

Indeed, I see that Crawley (2007) gives this example on p.100:
[>] murder<-read.table("c:\\temp\\murders.txt,header=T,as.is="region")
, but until now I hadn't understood what "region" meant, so I didn't see 
how to use "as.is".  Now I realise "region" is a header name in that 
particular data.frame.
Sorry for being such a newby!

> 'stringsAsFactors' a single logical. Should all character variables be
> converted to factors.
>   
I hadn't come across this. This would be useful if you wanted variables 
as factors, but my problem was that I got "factors" when I wanted "numeric"
> Some of these would have been useful in your case.
>
> I'm not sure what you tried, but I have found that saving an .xls file
> as a CSV via OpenOffice.org (on Linux) and subsequently reading it in
> with read.csv("foo.csv", ...) to be reasonably fool proof, especially
> when one makes use of the arguments about for fine-grained processing.
>   
I can appreciate this would probably have worked for me too, had I known 
how to do the "fine-grained processing".
> Someone in this thread posted a response that included the use of RODBC,
> which I haven't tried, but there are a plethora of ways to read data
> from Excel without having to torture yourself and the data formats to do
> so.
>   
I saw this about RODBC, but it seemed a complicated way of doing things, 
and in any case it seems that to run RODBC you need Excel 2004 or 
higher, which I would need to buy.
One beautiful thing about R and OOo is that they are excellent and they 
cost nothing, so you don't have to buy, borrow or steal them.
> HTH
>
> G
>
>   
Ian
>> My data were in an Excel xls file
>>
>> I have R (version 2.6.2) installed in Kubuntu Linux
>> I also have R (version 2.6.2) installed in Windows XP SP_3 running in 
>> VirtualBox (a Virtual Computer) in Kubuntu, and I have (very old) Excel 
>> 97 on this system.
>>
>> I wasted a lot of time exporting from Excel in various formats (txt, 
>> csv, dif, tab-delimited, ;-delimited ,-delimited, etc.). (I checked they 
>> were of correct format by peeking with a text editor.)
>> Then I would try reading using e.g. read.table("[file 
>> path]",header=TRUE) or read.csv(...) or read.csv2(...), or 
>> read.DIF(...), with or without "header=TRUE" or "header =FALSE".
>> I also copied to the "clipboard" and tried reading using 
>> read.DIF("clipboard")
>> In many of these cases I did get a data.frame that looked nice on-screen.
>>
>> My recurrent problem, however, was that many of the numeric variables in 
>> the resultant data frame were CLASS "factor". If you do arithmetic or 
>> plotting on factors, either it fails or gives wrong results.
>>
>> So I spent hours using (as.numeric(...)) with variants and permutations, 
>> etc. Most times (as.numeric(...)) seems to work, but actually the data 
>> either remained unchanged (as a "factor") or gave "numeric" but wrong 
>> numbers.
>>
>> I read the xls file using gnumeric application and saved as a dif file, 
>> then used read.DIF("[file path]"). This gave some correct "numeric" 
>> numbers but jumbled and partly duplicated.
>>
>> N.B. My problems were essentially the same whether I used R in XP or in 
>> Linux (kubuntu)
>>
>> MY SOLUTION (working in Linux):
>> Read the Excel file (xls) using Open Office.org (version 2.4.1) 
>> (downloadable for free for Linux or Windows).
>> Save as dif file.
>> In R,   TT<-read.DIF("[file path]",header=TRUE)
>> It worked, and all my numerical data elements were "numeric", correct 
>> and in the right order. Omit "header=TRUE" if you don't want the first 
>> elements of the spreadsheet columns declared as headers.
>>
>> Hope this may help someone.
>>
>> Here's a subset of my data in a data.frame (environmental data on 
>> plankton):
>>
>>  >TT
>>   Stn Day Mean.salinity Mean.temperature Secchi.disc. 
>> Log.microplank.biomass
>> 1    1  12             0               14          0.7            
>> 1.954242509
>> 2    1  70          13.5            16.55          0.3            
>> 3.083860801
>> 3    1  93         13.45            16.85          0.6            
>> 2.651278014
>> 4    1 153          6.78             14.2          0.5            
>> 2.075546961
>> 5    1 200             0              9.3          0.7            
>> 1.612783857
>> 6    1 231             0              7.1          0.8            
>> 1.491361694
>> 7    1 283             0              8.8          0.4            
>> 2.123851641
>> 8    1 330          4.95             9.45          0.3            
>> 2.276461804
>> 9    1 370          16.6             12.3          0.4            
>> 2.728353782
>> 10   3  12         16.25            11.95         0.55            
>> 2.025305865
>> 11   3  70         22.35             16.1          0.5            
>> 2.096910013
>> 12   3  93         26.05            17.15          1.5            
>> 1.707570176
>> 13   3 153          23.4             14.2            1            
>> 1.755874856
>> 14   3 200         14.05              8.6          0.4            
>> 1.812913357
>> 15   3 231           7.9              6.3          0.3            
>> 1.897627091
>> 16   3 283          11.2             7.25          0.7            
>> 1.832508913
>> 17   3 330         19.95              8.1          0.5            
>> 1.785329835
>> 18   3 370         24.35             11.5          0.4            
>> 2.361727836
>> 19   4  12          18.1            12.05          0.6            
>> 1.792391689
>> 20   4  70         24.35             15.9          0.7            
>> 1.973127854
>> 21   4  93            27            17.35          1.3            
>> 1.982271233
>> 22   4 153          25.8             14.2          0.8            
>> 1.924279286
>> 23   4 200          16.2                9          0.4            
>> 1.653212514
>> 24   4 231          11.5             6.85          0.4            
>> 1.819543936
>> 25   4 283         10.95              8.2         0.25            
>> 2.096910013
>> 26   4 330          19.7             8.45          0.4            
>> 2.025305865
>> 27   4 370          25.6             11.5          0.5            
>> 2.274157849
>>
>> Ian Jenkinson
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>     


-- 
Dr Ian R. Jenkinson
Agence de Conseil et de Recherche Océanographiques
Lavergne
19320 La Roche Canillac
France

+33 555 29 19 48
+33 555 29 19 82 (fax)
+33 608 89 13 62 (mobile)
ian.jenkinson at wanadoo.fr
http://assoc.orange.fr/acro/




More information about the R-help mailing list