[R] linear correlation?
dechao wang
dechwang at yahoo.co.uk
Thu Mar 7 16:06:56 CET 2002
Thanks Andrew,
Consider the following example:
> x1<-c(1, 2, 3, 100, 200, 300)
> x2<-c(1.1,2.8,3.3, 108, 209, 303)
> x3<-c(2.8,3.8,5.3, 108, 209, 303)
> cor(x1,x2)
[1] 0.999655
> cor(x1,x3)
[1] 0.9997286
You can see that as x2 changed to x3 with only first
three numbers changing, the coefficients (x1, x2) and
(x1,x3) changed little. I thought this may be because
the last three numbers were in different units.
Consider another example:
> y1<-c(1, 2, 3, 4, 5, 6)
> y2<-c(1.1,2.8,3.3, 4.4, 5.5, 6.6)
> y3<-c(2.8,3.8,5.3, 4.5, 5.5, 6.6)
> cor(y1,y2)
[1] 0.9934715
> cor(y1,y3)
[1] 0.9254707
You can see that the coefficients (y1,y2) and (y1,y3)
are different as the first three numbers changed.
>From the two examples, we can see that the resolution
of compatibility bewteen items that contain different
units is lower (as shown in the first example) than
that of compatibility of items that contain the same
scale (as shown in example 2).
The results of the first example is not what we want,
isn't it? So I think it would be better if pre-process
the data that contain different units before
regression analysis. I do not think it is difficult to
write code using R to do that. My question is there
command already exist to do that before I write code?
--- Andrew Perrin <andrew_perrin at unc.edu> wrote: > On
Thu, 7 Mar 2002, [iso-8859-1] dechao wang wrote:
>
> > Hi, I have checked statistic textbooks about
> > correlations, but I am still not sure the
> correlation
> > analysis with different units, for example,
> >
> > x1<-c(1, 2, 3, 100, 200, 300)
> > x2<-c(1.1,2.8,3.3, 108, 209, 303)
> > the unit of the first 3 numbers is cm
> > the unit of the last 3 numbers is kg
> >
> > cor(x1,x2)=0.999655
> >
> > Can I explain the correlation coefficient as
> normal in
> > which all numbers have the same unit?
>
> I don't think the correlation depends on the units;
> it's a ratio, not an
> absolute. Consider the case of making the
> centimeters into meters:
>
> > x1m<-x1 * 100
> > cor(x1m,x2)
> [1] 0.999655
>
> The correlation doesn't change.
>
> >
> > Secondly, if keep the three large numbers
> unchanged,
> > just change the three small numbers, the
> coefficient
> > changes little, this means that the variation of
> three
> > small numbers is hidden by the three larger
> numbers.
> > Is there any solution in R to solve this issue?
> >
>
> I'm not sure what you mean by "hidden"; in your
> case, the correlations
> between the vectors are similar for both first and
> second halves:
> > cor(x1[4:6],x2[4:6])
> [1] 0.9997853
> > cor(x1[1:3],x2[1:3])
> [1] 0.953821
>
> so removing either half isn't going to change the
> result much.
>
>
>
----------------------------------------------------------------------
> Andrew J Perrin - andrew_perrin at unc.edu -
> http://www.unc.edu/~aperrin
> Assistant Professor of Sociology, U of North
> Carolina, Chapel Hill
> 269 Hamilton Hall, CB#3210, Chapel Hill, NC
> 27599-3210 USA
>
>
>
>
__________________________________________________
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list