[R] A basic statistics question

Rolf Turner r.turner at auckland.ac.nz
Fri Aug 15 23:49:31 CEST 2014


On 16/08/14 01:29, Joshua Wiley wrote:
>
> On Wed, Aug 13, 2014 at 7:41 AM, Rolf Turner <r.turner at auckland.ac.nz
> <mailto:r.turner at auckland.ac.nz>> wrote:
>
>     On 13/08/14 07:57, Ron Michael wrote:
>
>         Hi,
>
>         I would need to get a clarification on a quite fundamental
>         statistics property, hope expeRts here would not mind if I post
>         that here.
>
>         I leant that variance-covariance matrix of the standardized data
>         is equal to the correlation matrix for the unstandardized data.
>         So I used following data.
>
>
>     <SNIP>
>
>
>         (t(Data_Normalized) %*% Data_Normalized)/dim(Data___Normalized)[1]
>
>
>
>         Point is that I am not getting exact CORR matrix. Can somebody
>         point me what I am missing here?
>
>
>     You are using a denominator of "n" in calculating your "covariance"
>     matrix for your normalized data.  But these data were normalized
>     using the sd() function which (correctly) uses a denominator of n-1
>     so as to obtain an unbiased estimator of the population standard
>     deviation.
>
>
> As a small point n - 1 is not _quite_ an unbiased estimator of the
> population SD see Cureton. (1968).
> Unbiased Estimation of the Standard Deviation, The American
> Statistician, 22(1).
>
> To see this in action:
>
> res <- unlist(parLapply(cl, 1:1e7, function(i) sd(rnorm(10, mean = 0, sd
> = 1))))
> correction <- function(n) {
>      gamma((n-1)/2) * sqrt((n-1)/2) / gamma(n/2)
> }
> mean(res)
> # 0.972583
> mean(res * correction(10))
> # 0.9999216
>
> The calculation for sample variance is an unbiased estimate of the
> population variance, but square root is a nonlinear function and the
> square root of an unbiased estimator is not itself necessarily unbiased.


Aaaaarrrggghhh.  Yes of course.  I *know* that you don't get an unbiased 
estimate of the sd by using n-1 in the denominator; you get an unbiased 
estimate of the variance and as you say, sqrt() is a non-linear function 
.....

I just didn't think carefully enuff before I wrote.  Thanks for pulling 
me up on this error.

cheers,

Rolf


-- 
Rolf Turner
Technical Editor ANZJS



More information about the R-help mailing list