[Rd] Covariance calculation gives different answer than Excel (PR#13720)

Duncan Murdoch murdoch at stats.uwo.ca
Wed May 27 09:58:21 CEST 2009


On 26/05/2009 5:50 AM, apw at us.ibm.com wrote:
> Full_Name: Amos Waterland
> Version: 2.8.1
> OS: Ubuntu Linux
> Submission from: (NULL) (68.175.8.163)
> 
> 
> I calculated the covariance for a small data set as follows:
> 
> X <- c(1,2,3,4)
> Y <- c(3,3,4,3)
> cov(X,Y)
> [1] 0.1666667
> 
> But when doing the computation with pencil and paper I get:
> 
> ((-1.5)*(-0.25) + (-0.5)*(-0.25) + (0.5)*(0.75) + (1.5)*(-0.25))/4
> [1] 0.125
> 
> Microsoft Excel 2003 covar() also gives 0.125.  I suspect that you guys are
> doing something like this:
> 
> ((-1.5)*(-0.25) + (-0.5)*(-0.25) + (0.5)*(0.75) + (1.5)*(-0.25))/3
> [1] 0.1666667
> 
> That is, you are dividing by N minus 1 rather than N.  So who is correct?

Please don't claim something is a bug when you are not sure.  cov() is 
clearly documented to use n-1 in the denominator.  Excel (for their own 
reasons) uses n, which leads to surprises like var(x) != covar(x, x), 
because they use n-1 in their variance calculation.

Duncan Murdoch



More information about the R-devel mailing list