[R] Excel

Greg Snow Greg.Snow at intermountainmail.org
Wed Aug 29 22:49:03 CEST 2007


Erich Neuwirth said:

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Erich Neuwirth
> Sent: Wednesday, August 29, 2007 12:43 PM
> To: r-help
> Subject: Re: [R] Excel
> 
> Excel bashing can be fun but also can be dangerous because 
> you are makeing your life harder than necessary.

My experience differs, so far using excel (other than as a table layout
program) has made my life harder more times than it has made it easier.

> Statisticians meanwhile know that the numerics of statistical 
> computation can be quite bad, therefore one should not use them.

Yes, statisticians know this (or should), but what happens when one of
your clients sees you using the excel interface to R and thinks to
themselves that they know how to use excel and go back and use the
computations that we all know they shouldn't use?

Or do you trust all of your clients to know to use R(D)COM as well as
how to install and use it?

In my case, most of my clients I would trust with my life, but not  my
data (not even their data), I try to talk them into using at least
access if they have to stick with an MS product.

> But using our (we = Thomas Baier + Erich Neuwirth) RExcel 
> addin either with the R(D)COM server or with rcom (package on 
> CRAN) allows you to use all the nice features of Excel (yes, 
> there are quite a few) and use R as as the computational 
> engine within Excel.

I'll have to take your word that excel has good features, I have not
found them.

How do you tell the nice features from the erroneous features?  And how
do you tell the difference between the good features and the horrible
features that MS marketing has dressed up nice and convinced a bunch of
people that they are nice?

> The formula
> =RApply("var",A1:A1000) in an Excel cell for example will use 
> R to compute the variance of the data in column A in Excel. 
> If you change any of the values in the range A1:A1000 will 
> automatically recompute the variance.

And what happens when you enter a value in cell A1001?

And what happens if you set cell B1 to =A1 and do the magic copy so that
b2=a2, b3=a3, ... Then put 
=Rapply("var",B1:B1000) in a cell, does the answer match with
=Rapply("var",A1:A1000)?

Yes, the auto-recompute could be considered a nice feature, but does it
really save that much work compared to running a script in R after
updating the data? (a couple of clicks in Rgui, a couple of keystrokes
in ESS, one line of code in an R terminal) 

> There is one feature in Excel which is extremely convenient, 
> Pivot tables. Anybody doing any work as statistical 
> consultant really ought to know about Pivot tables, and I am 
> still surprised how many statisticians do not know about it. 
> Neither Gnumeric nor OpenOffice Calc offer comparably 
> convenient ways working with multidimensional tables.

I will admit that I have not learned the power of the pivot table, but
the main reason for that is everytime someone demonstrates the power of
the pivot table to me it is by creating a table, then showing that if
that is not what you want you can click here, here, here, then here and
it is a different table, then you click here and here, ....  At which
point I am completely lost as to which clicks did which.  There are
tools in R (the reshape package has been mentioned) than give you the
power to create the tables, but with a history rather than having to
remember which clicks are needed.  My experience with pivot tables so
far are that they are a post-hoc kludge to a poorly designed original
table.  I prefer working with a script where if the result is not what I
want, I can fix the source rather than admire how pretty the band-aid
is.  (Maybe there are ways to script a pivot table without the clicking,
but everyone who has tried to market them to me have been more impressed
with the band-aid than in preventing the original injury).

What happens when you make a nice pivot table for a client, then they
come back 2 months latter with a new spreadsheet and want a similar
table for this data?  I don't think I could remember the set of clicks
needed, but with an R script I can pull up the set of commands I used
before, change the variable and table names, and have the results in a
couple of minutes.

> I think the answer to the question
> "Excel or R" of course is "Excel and R".

I think the page by Patrick Burns gives a more complete answer
(http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html).

It is ok to use spreadsheets for jobs that spreadsheets are the best
tool, but too many people have the excel hammer and see everything as a
nail. 
 
> --
> Erich Neuwirth, University of Vienna
> Faculty of Computer Science
> Computer Supported Didactics Working Group Visit our SunSITE 
> at http://sunsite.univie.ac.at
> Phone: +43-1-4277-39464 Fax: +43-1-4277-39459
> 



-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111



More information about the R-help mailing list