[R] read a xls file

Ian Jenkinson ian.jenkinson at wanadoo.fr
Tue Jan 20 14:45:50 CET 2009


> See the R Import/Export manual.  Also
> RSiteSearch("import excel")
> gives many hits.  It seems as if this question is
> asked almost daily.
>
> On Sun, Jan 18, 2009 at 9:15 AM, Michele Santacatterina
> <miksanta at gmail.com> wrote:
>   
>> > Hello,
>> >
>> > i have a xls file. I will read it in r, what library-command i use for
>> > this??
>> >
>> > any ideas??
I feel concerned because I have just spent a frustrating couple of days 
trying to read an Excel (xls) file, with the aid of the R book (Crawley, 
2007), and R help files. I failed, but finally found a workaround. My 
experience might help others.

My data were in an Excel xls file

I have R (version 2.6.2) installed in Kubuntu Linux
I also have R (version 2.6.2) installed in Windows XP SP_3 running in 
VirtualBox (a Virtual Computer) in Kubuntu, and I have (very old) Excel 
97 on this system.

I wasted a lot of time exporting from Excel in various formats (txt, 
csv, dif, tab-delimited, ;-delimited ,-delimited, etc.). (I checked they 
were of correct format by peeking with a text editor.)
Then I would try reading using e.g. read.table("[file 
path]",header=TRUE) or read.csv(...) or read.csv2(...), or 
read.DIF(...), with or without "header=TRUE" or "header =FALSE".
I also copied to the "clipboard" and tried reading using 
read.DIF("clipboard")
In many of these cases I did get a data.frame that looked nice on-screen.

My recurrent problem, however, was that many of the numeric variables in 
the resultant data frame were CLASS "factor". If you do arithmetic or 
plotting on factors, either it fails or gives wrong results.

So I spent hours using (as.numeric(...)) with variants and permutations, 
etc. Most times (as.numeric(...)) seems to work, but actually the data 
either remained unchanged (as a "factor") or gave "numeric" but wrong 
numbers.

I read the xls file using gnumeric application and saved as a dif file, 
then used read.DIF("[file path]"). This gave some correct "numeric" 
numbers but jumbled and partly duplicated.

N.B. My problems were essentially the same whether I used R in XP or in 
Linux (kubuntu)

MY SOLUTION (working in Linux):
Read the Excel file (xls) using Open Office.org (version 2.4.1) 
(downloadable for free for Linux or Windows).
Save as dif file.
In R,   TT<-read.DIF("[file path]",header=TRUE)
It worked, and all my numerical data elements were "numeric", correct 
and in the right order. Omit "header=TRUE" if you don't want the first 
elements of the spreadsheet columns declared as headers.

Hope this may help someone.

Here's a subset of my data in a data.frame (environmental data on 
plankton):

 >TT
  Stn Day Mean.salinity Mean.temperature Secchi.disc. 
Log.microplank.biomass
1    1  12             0               14          0.7            
1.954242509
2    1  70          13.5            16.55          0.3            
3.083860801
3    1  93         13.45            16.85          0.6            
2.651278014
4    1 153          6.78             14.2          0.5            
2.075546961
5    1 200             0              9.3          0.7            
1.612783857
6    1 231             0              7.1          0.8            
1.491361694
7    1 283             0              8.8          0.4            
2.123851641
8    1 330          4.95             9.45          0.3            
2.276461804
9    1 370          16.6             12.3          0.4            
2.728353782
10   3  12         16.25            11.95         0.55            
2.025305865
11   3  70         22.35             16.1          0.5            
2.096910013
12   3  93         26.05            17.15          1.5            
1.707570176
13   3 153          23.4             14.2            1            
1.755874856
14   3 200         14.05              8.6          0.4            
1.812913357
15   3 231           7.9              6.3          0.3            
1.897627091
16   3 283          11.2             7.25          0.7            
1.832508913
17   3 330         19.95              8.1          0.5            
1.785329835
18   3 370         24.35             11.5          0.4            
2.361727836
19   4  12          18.1            12.05          0.6            
1.792391689
20   4  70         24.35             15.9          0.7            
1.973127854
21   4  93            27            17.35          1.3            
1.982271233
22   4 153          25.8             14.2          0.8            
1.924279286
23   4 200          16.2                9          0.4            
1.653212514
24   4 231          11.5             6.85          0.4            
1.819543936
25   4 283         10.95              8.2         0.25            
2.096910013
26   4 330          19.7             8.45          0.4            
2.025305865
27   4 370          25.6             11.5          0.5            
2.274157849

Ian Jenkinson




More information about the R-help mailing list