[R] Variance is different in R vs. Excel?

Ranjan Maitra maitra.mbox.ignored at inbox.com
Tue Feb 10 00:39:14 CET 2015


I suspect that this is the long-documented issue with indeed an entire industry -- and publications -- devoted to finding such errors in Excel. Till the 2013 version, it used to be a favorite HW problem of mine. Basically, Excel uses the "short formula" to calculate the variance and the sd. This "short formula" has numerical issues with larger numbers (though I am surprised at the OP's data because these numbers were not that large). Anyway, the "long formula" which removes the mean from each datapoint, squares and sums is preferred with large numbers. 

Btw, my HW problem for incoming students in my R class would be this:

Consider the following numbers:
100000000000001, 100000000000002, 100000000000001, 100000000000002, 100000000000001,
100000000000002, 100000000000001, 100000000000002, 100000000000001, 100000000000002.

Calculate the variance in Excel (gives pure garbage) and in R.

I got this (or may have adapted it) from the book: Numerical Issues in Statistical Computing for the Social Scientist by M. Altman, J. Gill and M. P. McDonald.

After over 10 years, Excel finally appears to have fixed the issue. gnumeric never had this problem.

Best wishes,
Ranjan


On Mon, 9 Feb 2015 22:15:48 +0000 Ted Harding <Ted.Harding at wlandres.net> wrote:

> [See at end]
> 
> On 09-Feb-2015 21:45:11 David L Carlson wrote:
> > Time for a new version of Excel? I cannot duplicate your results in Excel
> > 2013.
> > 
> > R:
> >> apply(dat, 2, var)
> > [1] 21290.80 24748.75
> > 
> > Excel 2013:
> > =VAR.S(A2:A21)   =VAR.S(B2:B21)
> > 21290.8          24748.74737
> > 
> > -------------------------------------
> > David L Carlson
> > Department of Anthropology
> > Texas A&M University
> > College Station, TX 77840-4352
> > 
> > 
> > -----Original Message-----
> > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Karl Fetter
> > Sent: Monday, February 9, 2015 3:33 PM
> > To: r-help at r-project.org
> > Subject: [R] Variance is different in R vs. Excel?
> > 
> > Hello everyone, I have a simple question. when I use the var() function in
> > R to find a variance, it differs greatly from the variance found in excel
> > using the =VAR.S function. Any explanations on what those two functions are
> > actually doing?
> > 
> > Here is the data and the results:
> > 
> > dat<-matrix(c(402,908,553,522,627,1040,756,679,806,711,713,734,683,790,597,872
> > ,476,1026,423,476,419,591,376,640,550,601,588,499,646,693,351,730,632,707,779,
> > 838,814,771,533,818),
> > nrow=20, ncol=2, byrow=T)
> > 
> > var(dat[,1])
> >#21290.8
> > 
> > var(dat[,2])
> >#24748.75
> > 
> >#in Excel, the variance of dat[,1] = 44763.91; for dat[,2] = 52034.2
> > 
> > Thanks,
> > Karl
> 
> I suspect that something has happened to the reading-in of the
> data into Excel. (I don't know much about Excel, and that's because
> I don't want to ... ).
> 
> The ratio of the variances of the two datasets in R is:
> 
>   var(dat[,2])/var(dat[,1])
>   # [1] 1.162415
> 
> while the ratio of th results from Excel is:
> 
>   52034.2/44763.91
>   # [1] 1.162414
> 
> so they are almost identical. 
> 
> So it is as if Excel was evaluating the variances for data which
> are
> 
>   sqrt(44763.91/var(dat[,1]))
>   # [1] 1.45
>   sqrt(52034.2/var(dat[,2]))
>   # [1] 1.449999
> 
> times the data used by R. So maybe there's a "nasty" lurking somewhere
> in the spreadsheet? (Excel is notorious for planting things invisibly
> in its spreadsheets which lead to messed-up results for no apparent
> reasion ... ).
> 
> Hoping this helps,
> Ted.
> 
> -------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
> Date: 09-Feb-2015  Time: 22:15:44
> This message was sent by XFMail
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-- 
Important Notice: This mailbox is ignored: e-mails are set to be deleted on receipt. Please respond to the mailing list if appropriate. For those needing to send personal or professional e-mail, please use appropriate addresses.

____________________________________________________________
Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.



More information about the R-help mailing list