[R] Ranjan Maitra's comment "Re: Variance is different in R vs. Excel?

Ranjan Maitra maitra.mbox.ignored at inbox.com
Wed Feb 11 02:39:07 CET 2015


Thanks! Interesting! Perhaps there was a patch to Excel later on, because otherwise, this does not explain things. Anyway, regardless, good to know that the long-standing issue has been fixed.

The title of your e-mail post is amusing!

Ranjan

On Tue, 10 Feb 2015 18:09:22 -0600 Ronald Wyllys <wyllys at ischool.utexas.edu> wrote:

> FWIW, both Excel 2007 and LibreOffice 4.2 yield the correct variance for 
> the numbers in Ranjan Maitra's HW problem for incoming students in R.  
> Namely, both these programs yield a sample variance of 0.2777777778 
> (rounded to 10 decimal digits).
> 
> Ronald Wyllys
> 
> 
> On 02/10/2015 05:00 AM, r-help-request at r-project.org wrote:
> > ate: Mon, 9 Feb 2015 17:39:14 -0600
> > From: Ranjan Maitra<maitra.mbox.ignored at inbox.com>
> > To:<r-help at stat.math.ethz.ch>
> > Subject: Re: [R] Variance is different in R vs. Excel?
> > Message-ID:<20150209173914.bae4d99ebeadafed3515372c at inbox.com>
> > Content-Type: text/plain; charset="us-ascii"
> >
> > I suspect that this is the long-documented issue with indeed an entire industry -- and publications -- devoted to finding such errors in Excel. Till the 2013 version, it used to be a favorite HW problem of mine. Basically, Excel uses the "short formula" to calculate the variance and the sd. This "short formula" has numerical issues with larger numbers (though I am surprised at the OP's data because these numbers were not that large). Anyway, the "long formula" which removes the mean from each datapoint, squares and sums is preferred with large numbers.
> >
> > Btw, my HW problem for incoming students in my R class would be this:
> >
> > Consider the following numbers:
> > 100000000000001, 100000000000002, 100000000000001, 100000000000002, 100000000000001,
> > 100000000000002, 100000000000001, 100000000000002, 100000000000001, 100000000000002.
> >
> > Calculate the variance in Excel (gives pure garbage) and in R.
> >
> > I got this (or may have adapted it) from the book: Numerical Issues in Statistical Computing for the Social Scientist by M. Altman, J. Gill and M. P. McDonald.
> >
> > After over 10 years, Excel finally appears to have fixed the issue. gnumeric never had this problem.
> >
> > Best wishes,
> > Ranjan
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Important Notice: This mailbox is ignored: e-mails are set to be deleted on receipt. Please respond to the mailing list if appropriate. For those needing to send personal or professional e-mail, please use appropriate addresses.

____________________________________________________________
FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!



More information about the R-help mailing list