[R] read a xls file

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Jan 20 15:24:26 CET 2009


On Tue, 2009-01-20 at 14:45 +0100, Ian Jenkinson wrote:
> > See the R Import/Export manual.  Also
> > RSiteSearch("import excel")
> > gives many hits.  It seems as if this question is
> > asked almost daily.
> >
> > On Sun, Jan 18, 2009 at 9:15 AM, Michele Santacatterina
> > <miksanta at gmail.com> wrote:
> >   
> >> > Hello,
> >> >
> >> > i have a xls file. I will read it in r, what library-command i use for
> >> > this??
> >> >
> >> > any ideas??
> I feel concerned because I have just spent a frustrating couple of days 
> trying to read an Excel (xls) file, with the aid of the R book (Crawley, 
> 2007), and R help files. I failed, but finally found a workaround. My 
> experience might help others.

You did read ?read.table yes?

There are three arguments there that can help in such situations: 

'colClasses' allows the finest grained control over how R imports your
text files. You specify what each column is, noting that if you have
lots of columns, things like 

c("numeric", rep("character", 12))

will deal with runs of columns of the same type, without having to type
them all by hand.

'as.is' is a vector of logicals (TRUE/FALSE) that controls whether a
column is read in as is or converted.

'stringsAsFactors' a single logical. Should all character variables be
converted to factors.

Some of these would have been useful in your case.

I'm not sure what you tried, but I have found that saving an .xls file
as a CSV via OpenOffice.org (on Linux) and subsequently reading it in
with read.csv("foo.csv", ...) to be reasonably fool proof, especially
when one makes use of the arguments about for fine-grained processing.

Someone in this thread posted a response that included the use of RODBC,
which I haven't tried, but there are a plethora of ways to read data
from Excel without having to torture yourself and the data formats to do
so.

HTH

G

> 
> My data were in an Excel xls file
> 
> I have R (version 2.6.2) installed in Kubuntu Linux
> I also have R (version 2.6.2) installed in Windows XP SP_3 running in 
> VirtualBox (a Virtual Computer) in Kubuntu, and I have (very old) Excel 
> 97 on this system.
> 
> I wasted a lot of time exporting from Excel in various formats (txt, 
> csv, dif, tab-delimited, ;-delimited ,-delimited, etc.). (I checked they 
> were of correct format by peeking with a text editor.)
> Then I would try reading using e.g. read.table("[file 
> path]",header=TRUE) or read.csv(...) or read.csv2(...), or 
> read.DIF(...), with or without "header=TRUE" or "header =FALSE".
> I also copied to the "clipboard" and tried reading using 
> read.DIF("clipboard")
> In many of these cases I did get a data.frame that looked nice on-screen.
> 
> My recurrent problem, however, was that many of the numeric variables in 
> the resultant data frame were CLASS "factor". If you do arithmetic or 
> plotting on factors, either it fails or gives wrong results.
> 
> So I spent hours using (as.numeric(...)) with variants and permutations, 
> etc. Most times (as.numeric(...)) seems to work, but actually the data 
> either remained unchanged (as a "factor") or gave "numeric" but wrong 
> numbers.
> 
> I read the xls file using gnumeric application and saved as a dif file, 
> then used read.DIF("[file path]"). This gave some correct "numeric" 
> numbers but jumbled and partly duplicated.
> 
> N.B. My problems were essentially the same whether I used R in XP or in 
> Linux (kubuntu)
> 
> MY SOLUTION (working in Linux):
> Read the Excel file (xls) using Open Office.org (version 2.4.1) 
> (downloadable for free for Linux or Windows).
> Save as dif file.
> In R,   TT<-read.DIF("[file path]",header=TRUE)
> It worked, and all my numerical data elements were "numeric", correct 
> and in the right order. Omit "header=TRUE" if you don't want the first 
> elements of the spreadsheet columns declared as headers.
> 
> Hope this may help someone.
> 
> Here's a subset of my data in a data.frame (environmental data on 
> plankton):
> 
>  >TT
>   Stn Day Mean.salinity Mean.temperature Secchi.disc. 
> Log.microplank.biomass
> 1    1  12             0               14          0.7            
> 1.954242509
> 2    1  70          13.5            16.55          0.3            
> 3.083860801
> 3    1  93         13.45            16.85          0.6            
> 2.651278014
> 4    1 153          6.78             14.2          0.5            
> 2.075546961
> 5    1 200             0              9.3          0.7            
> 1.612783857
> 6    1 231             0              7.1          0.8            
> 1.491361694
> 7    1 283             0              8.8          0.4            
> 2.123851641
> 8    1 330          4.95             9.45          0.3            
> 2.276461804
> 9    1 370          16.6             12.3          0.4            
> 2.728353782
> 10   3  12         16.25            11.95         0.55            
> 2.025305865
> 11   3  70         22.35             16.1          0.5            
> 2.096910013
> 12   3  93         26.05            17.15          1.5            
> 1.707570176
> 13   3 153          23.4             14.2            1            
> 1.755874856
> 14   3 200         14.05              8.6          0.4            
> 1.812913357
> 15   3 231           7.9              6.3          0.3            
> 1.897627091
> 16   3 283          11.2             7.25          0.7            
> 1.832508913
> 17   3 330         19.95              8.1          0.5            
> 1.785329835
> 18   3 370         24.35             11.5          0.4            
> 2.361727836
> 19   4  12          18.1            12.05          0.6            
> 1.792391689
> 20   4  70         24.35             15.9          0.7            
> 1.973127854
> 21   4  93            27            17.35          1.3            
> 1.982271233
> 22   4 153          25.8             14.2          0.8            
> 1.924279286
> 23   4 200          16.2                9          0.4            
> 1.653212514
> 24   4 231          11.5             6.85          0.4            
> 1.819543936
> 25   4 283         10.95              8.2         0.25            
> 2.096910013
> 26   4 330          19.7             8.45          0.4            
> 2.025305865
> 27   4 370          25.6             11.5          0.5            
> 2.274157849
> 
> Ian Jenkinson
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%




More information about the R-help mailing list