[R] A basic statistics question

(Ted Harding) Ted.Harding at wlandres.net
Wed Aug 13 00:22:13 CEST 2014


On 12-Aug-2014 21:41:52 Rolf Turner wrote:
> On 13/08/14 07:57, Ron Michael wrote:
>> Hi,
>>
>> I would need to get a clarification on a quite fundamental statistics
>> property, hope expeRts here would not mind if I post that here.
>>
>> I leant that variance-covariance matrix of the standardized data is equal to
>> the correlation matrix for the unstandardized data. So I used following
>> data.
> 
> <SNIP>
> 
>> (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
>>
>> Point is that I am not getting exact CORR matrix. Can somebody point
>> me what I am missing here?
> 
> You are using a denominator of "n" in calculating your "covariance" 
> matrix for your normalized data.  But these data were normalized using 
> the sd() function which (correctly) uses a denominator of n-1 so as to 
> obtain an unbiased estimator of the population standard deviation.
> 
> If you calculated
> 
>     (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)
> 
> then you would get the same result as you get from cor(Data) (to within 
> about 1e-15).
> 
> cheers,
> Rolf Turner

One could argue about "(correctly)"!

>From the "descriptive statistics" point of view, if one is given a single
number x, then this dataset has no variation, so one could say that
sd(x) = 0. And this is what one would get with a denominator of "n".

But if the single value x is viewed as sampled from a distribution
(with positive dispersion), then the value of x gives no information
about the SD of the distribution. If you use denominator (n-1) then
sd(x) = NA, i.e. is indeterminate (as it should be in this application).

The important thing when using pre-programmed functions is to know
which is being used. R uses (n-1), and this can be found from
looking at

  ?sd

or (with more detail) at

  ?cor

Ron had assumed that the denominator was n, apparently not being aware
that R uses (n-1).

Just a few thoughts ...
Ted.

-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 12-Aug-2014  Time: 23:22:09
This message was sent by XFMail



More information about the R-help mailing list