[R] Correlation question

David L Carlson dcarlson at tamu.edu
Sun Feb 22 16:49:32 CET 2015


As Kehl pointed out, any linear function of the independent variable (speed) will have the same squared correlation with the dependent variable (dist), but only one linear function minimizes the squared deviations between the fitted values and the original values. The equation you are using is only applicable to that function, not to any of the others. In fact, some linear functions will produce negative values:

> fitted.new <- 6*cars$speed
> cor(cbind(fitted.new, fitted.right, fitted.wrong, cars$dist))
             fitted.new fitted.right fitted.wrong          
fitted.new    1.0000000    1.0000000    1.0000000 0.8068949
fitted.right  1.0000000    1.0000000    1.0000000 0.8068949
fitted.wrong  1.0000000    1.0000000    1.0000000 0.8068949
              0.8068949    0.8068949    0.8068949 1.0000000
> 1-sum((cars$dist-fitted.new)^2)/sum((cars$dist-mean(cars$dist))^2)
[1] -3.281849

David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jonathan Thayn
Sent: Sunday, February 22, 2015 12:01 AM
To: Kehl Dániel
Cc: r-help at r-project.org
Subject: Re: [R] Correlation question

Of course! Thank you, I knew I was missing something painfully obvious. Its seems, then, that this line

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)

is finding something other than the traditional correlation. I found this in a lecture introducing correlation, but , now, I'm not sure what it is. It does do a better job of showing that the fitted.wrong variable is not a good prediction of the distance. 



On Feb 21, 2015, at 4:36 PM, Kehl Dániel wrote:

> Hi,
> 
> try
> 
> cor(fitted.right,fitted.wrong)
> 
> should give 1 as both are a linear function of speed! Hence cor(cars$dist,fitted.right)^2 and cor(x=cars$dist,y=fitted.wrong)^2 must be the same.
> 
> HTH
> d
> ________________________________________
> Feladó: R-help [r-help-bounces at r-project.org] ; meghatalmazó: Jonathan Thayn [jthayn at ilstu.edu]
> Küldve: 2015. február 21. 22:42
> To: r-help at r-project.org
> Tárgy: [R] Correlation question
> 
> I recently compared two different approaches to calculating the correlation of two variables, and I cannot explain the different results:
> 
> data(cars)
> model <- lm(dist~speed,data=cars)
> coef(model)
> fitted.right <- model$fitted
> fitted.wrong <- -17+5*cars$speed
> 
> 
> When using the OLS fitted values, the lines below all return the same R2 value:
> 
> 1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2)
> cor(cars$dist,fitted.right)^2
> (sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2
> 
> 
> However, when I use my estimated parameters to find the fitted values, "fitted.wrong", the first equation returns a much lower R2 value, which I would expect since the fit is worse, but the other lines return the same R2 that I get when using the OLS fitted values.
> 
> 1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)
> cor(x=cars$dist,y=fitted.wrong)^2
> (sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2
> 
> 
> I'm sure I'm missing something simple, but can someone explain the difference between these two methods of finding R2? Thanks.
> 
> Jon
>        [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list