[R] correlation coefficient

Tue Apr 28 19:02:18 CEST 2009

Dear Colleagues:

Martin's reply provides an appropriate response, so nothing to add. But my
questions dig deeper: Why do so many (presumably nonstatisticians, but ?)
belong to this R^2 religion? Is it because:

1) This is what they are taught in their Stat 101 courses by statisticians?
2) ... by "pseudo"statisticians in their own professions (no disrespect
intended here -- just want to make a clear distinction)?
3) It's the prevailing culture of their discipline (journal requirements,
part of their standard texts, etc.)?
4) What all "standard" statistical textbooks say?
5) ... ?

Also, if one believes this religious practice is counterproductive, how
would one go about changing it?

FEEL FREE TO REPLY OFF-LIST, AS IT IS PROBABLY INAPPROPRIATE TO WASTE R-HELP
BANDWIDTH ON THIS. ALSO FEEL FREE TO REFER ME INSTEAD TO ANOTHER DISCUSSION
SITE (E.G. ON STATISTICAL TEACHING) WHERE THIS HORSE MAY HAVE ALREADY BEEN
FLAYED.

Thanks.

Bert Gunter
Genentech Nonclinical Biostatistics

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Martin Maechler
> Sent: Tuesday, April 28, 2009 8:22 AM
> To: Benedikt Niesterok
> Cc: r-help at r-project.org
> Subject: Re: [R] correlation coefficient
> 
> >>>>> "BN" == Benedikt Niesterok <KleinerHaifisch at gmx.net>
> >>>>>     on Tue, 28 Apr 2009 15:33:02 +0200 writes:
> 
>     BN> Hello,
>     BN> I would like to get a correlation coefficient 
> (R-squared) for my model.
> 
> {{ arrrgh... how many people think they "need" an R^2 when they
>    	     fit a model ?? }}
> 
>     BN> I don't know how to calculate it in R.
>     BN> What I've done so far:
> 
>     BN> x<-8.5:32.5   #Vektor x
>     BN> y<-c(NA ,5.88 , 6.95  , 7.2 , 7.66 , 8.02 , 8.44 , 
> 9.06,  9.65, 10.22 ,
>     BN> 10.63 ,11.06, 11.37, 11.91 ,12.28, 12.69 ,13.07 , 
> 13.5 , 13.3 ,14.14  ,  NA  ,  NA ,   NA  ,  NA  ,  NA) #Vektor y
>     BN> plot(y~x,col="green",pch=16,ylim=c(0,20),xlim=c(0,50))
> 
>     BN> 
> (mod1<-nls(y~a+b*log(x,base=exp(1)),start=list(a=1,b=1),trace=TRUE))
> 
> This is a very *LINEAR* model.
> Why don't you use  lm()?
> 
> Then you'd even get your beloved R-squared ...
> 
>     BN> xx<-seq(min(x),max(x),length=100)
>     BN> yy<-6.2456*log(xx)-7.7822
>     BN> lines(xx,yy,col="blue1")
>     BN> summary(mod1)
> 
>     BN> This way I don't get R-squared like I do using the 
> command "lm" for linear
>     BN> models.
> 
> In general,  R^2 is *NOT* easily defined for non-linear models.
> R^2 is only defined if you have a nested sub-model, aka "null-model". 
> For linear models (*WITH* an intercept (!)), the sub-model is
> naturally  y ~ 1.
> For general nonlinear models, the only simple sub-model is  
> 'y ~ 0' which is often ridiculous to take as null-model, and
> hence not taken by default.
> 
> More more on this, e.g. almost 7 years ago on R-help:
> 
>   https://stat.ethz.ch/pipermail/r-help/2002-July/023461.html
> 
> Martin
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>