[R] importing data in excel

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Mon Aug 30 11:42:24 CEST 2004


On 30-Aug-04 ronggui wong wrote:
> if under  windows ,the command works well,but under linux,
> it does NOT,as linux can keep couples of file in clipboar.
> so i want to know how to do similar thing under linux.
> anyone knows?  

Since you're using Linux, maybe the following suggestions will
work for you (it's how I almost always deal with this situation).
This follows on from Thomas Petzoldt's suggestions.

1. Save the spreadsheet out of Excel as a CSV (Comma Separated
   Variables) file. There are appropriate options available
   in the "save as ... " menu.

   Comments. There can be complications if the original .xls file
   was badly prepared. You need to ensure that it comes out with
   exactly the same number of columns as there are variables, and
   exactly the same number of rows as there are rows of data in
   the .xls file (plus 1 for the header row of variable names, if
   present). There should be (k-1) commas in every line if there
   are k columns of data.

   a) I have known spreadsheets where the person entering the data
      has simply entered nothing in "short rows", i.e. where the
      last few variables have "missing values".
   b) Likewise, some rows may have had "space" entered beyond the
      range of data columns, so some rows are too long.
   c) Likewise, some rows below the range of data rows may have
      had blank entered in some cells, so Excel sees these as
      having content.

   For the most part, these complications will be avoided if, as
   Thomas suggests, you "mark" the full rectangular area prior to
   saving the marked area. Of course, you may already know that
   these problems will not arise in your case; but I can assure
   you from experience that if you are receiving your data from
   someone else then you should watch out for them.

   d) Similarly, you may encounter cases where the user has entered
      a "." (or similar) in blank/missing cells, though not necessarily
      in all, and possibly (as in (b) above) where they should not have.

2. Transfer the resulting file (say "mydata.csv") from the Windows
   machine to the Linux machine.

3. Run "dos2unix" on this file:

     dos2unix mydata.csv

   This ensures in particular that the end-of-line terminator is
   a clean unix-compatible LF ("\n") and not the Windows pair CRLF
   ("\r\n") which can cause complications; and also will strip any
   DOS-style "filler bytes" from the end of the file. It also copes
   with files originating on MAC systems (where the EOL is "\r").

4. If the resulting file "mydata.csv" is now nice and clean (which
   normally it should be, unless things like (a-d) above apply),
   then you should be able to bring it into R as a dataframe by
   simply giving the command

     mydata.df <- read.csv("mydata.csv")

   or similar.

5. However, if it has fallen victim to the sort of complication
   described above, then you will need to clean it up before use.

   Unix/Linux happily comes provided with plenty of file utilities
   which assist with such clean-up. These include the programs
   'sed', 'awk', 'grep', 'sort' and the like, and for the most part
   I can get such jobs done just using 'sed' and 'awk'.

   If you find yourself in this kind of situation and need further
   advice, get in touch with me off-list.

Good luck,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 30-Aug-04                                       Time: 10:42:23
------------------------------ XFMail ------------------------------




More information about the R-help mailing list