[R] A basic statistics question

Joshua Wiley jwiley.psych at gmail.com
Fri Aug 15 15:29:35 CEST 2014


On Wed, Aug 13, 2014 at 7:41 AM, Rolf Turner <r.turner at auckland.ac.nz>
wrote:

> On 13/08/14 07:57, Ron Michael wrote:
>
>> Hi,
>>
>> I would need to get a clarification on a quite fundamental statistics
>> property, hope expeRts here would not mind if I post that here.
>>
>> I leant that variance-covariance matrix of the standardized data is equal
>> to the correlation matrix for the unstandardized data. So I used following
>> data.
>>
>
> <SNIP>
>
>
>  (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
>>
>>
>>
>> Point is that I am not getting exact CORR matrix. Can somebody point me
>> what I am missing here?
>>
>
> You are using a denominator of "n" in calculating your "covariance" matrix
> for your normalized data.  But these data were normalized using the sd()
> function which (correctly) uses a denominator of n-1 so as to obtain an
> unbiased estimator of the population standard deviation.
>

As a small point n - 1 is not _quite_ an unbiased estimator of the
population SD see Cureton. (1968).
Unbiased Estimation of the Standard Deviation, The American Statistician,
22(1).

To see this in action:

res <- unlist(parLapply(cl, 1:1e7, function(i) sd(rnorm(10, mean = 0, sd =
1))))
correction <- function(n) {
    gamma((n-1)/2) * sqrt((n-1)/2) / gamma(n/2)
}
mean(res)
# 0.972583
mean(res * correction(10))
# 0.9999216

The calculation for sample variance is an unbiased estimate of the
population variance, but square root is a nonlinear function and the square
root of an unbiased estimator is not itself necessarily unbiased.




>
> If you calculated
>
>
>    (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)
>
> then you would get the same result as you get from cor(Data) (to within
> about 1e-15).
>
> cheers,
>
> Rolf Turner
>
> --
> Rolf Turner
> Technical Editor ANZJS
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua F. Wiley
Ph.D. Student, UCLA Department of Psychology
http://joshuawiley.com/
Senior Analyst, Elkhart Group Ltd.
http://elkhartgroup.com
Office: 260.673.5518

	[[alternative HTML version deleted]]



More information about the R-help mailing list