[R] linear correlation?
andrew_perrin at unc.edu
Thu Mar 7 20:14:48 CET 2002
On 7 Mar 2002, Michaell Taylor wrote:
> A good deal of this thread leaves me perplexed.
> Of course you can correlate vectors of differing units. Correlations
> are covariances expressed in a standardized unit. I.e. differing units
> is the reason for correlation coefficients in the first place.
> Of course you can correlate measures of different phenomenon - i.e.
> economic growth is correlated with percentage of voters voting for the
> incumbent in the next election. Correlation of two different measures
> of the same phenomenon is called a test of reliability.
> Of course you can correlate cm and kg. I would be perfectly confortable
> stating that an person's weight in kg is correlated to their height in
> cm. Anyone disagree?
You're missing something basic, which is what I missed too when the OP
first posted. He's not correlating two variables, one of which is in cm
and one of which is in kg. He's correlating two *vectors* of six variables
each; three of these variables are in cm and three are in kg. So he's
treating a *case* (in his example, an apple tree) as a variable, and
asking for the correlation between two cases (apple trees).
> Obviously one has to be careful in extracting substantive meaning from
> correlations - just like every statistic that I can think of.
> In term of the big number small number thing. The major source of your
> observed correlations is coming from their being a set of small numbers
> and a set of big numbers. Think of these things as points on a graph.
> In your example,
> > x1<-c(1, 2, 3, 100, 200, 300)
> > > > > x2<-c(1.1,2.8,3.3, 108, 209, 303)
> > > > > x3<-c(2.8,3.8,5.3, 108, 209, 303)
> > > > > cor(x1,x2)
> > > >  0.999655
> > > > > cor(x1,x3)
> > > >  0.9997286
> The minor fluctions in these series between observations 1, 2,3 and
> 4,5,6 is totally dwarfed by the difference between 3-4 It is this jump
> between (3,3.3) and (100,108) which drives your correlations.
> Comparatively, the other changes are a wash.
... and the reason for these jumps is that the "small" numbers (the first
three in each vector) are centimeters, while the "large" numbers (the
latter three) are kilograms. That's the essence of the problem, and the
reason why the very exercise is inappropriate.
Andrew J Perrin - andrew_perrin at unc.edu - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help