[R] Measure of linearity between two variables?

Thu Apr 17 20:44:08 CEST 2003

Could you be more specific about what you want?

For example, do you want a statistical test with a significance 
probability or a measure like R^2?  Also, are your numbers bounded, 
e.g., between 0 and 1?  What kind of error structure is inherent in the 
application?  Should we think about transforming the response variable 
or using "glm" or nonlinear regression?

If I were concerned about saturation at either end, perhaps the 
simpliest thing might be to add a cubic to what you considered:

	x ~ y + I(y^2) + I(y^3)

 From this we could get significance probabilities for the squared and 
cubic terms combined.  Also, by converting the sum of squares column to 
percent, we get something like R^2.

Hope this helps.
Spencer Graves

Paul, David A wrote:
> Maybe I'm missing something, but why not use the Pearson Product
> Moment Correlation Coefficient (r) ?  It directly measures the strength
> of the linear relationship between two variables.  A simple approach
> would be the following:
> 
> (1) fix a percentage p of the data you are interested in
> (2) fix one of your two variables (x,y) as a reference - call
> 	it x
> (3) subset your data.frame down to those pairs (x*,y*) 
> 	corresponding to the middle p percent of x
> (4) calculate r for the pairs (x*,y*)
> 
> By doing (1) through (4) many time for increasing values of p
> I think you'll get what you want.
> 
> Best, 
>   david paul
> 
> 
> -----Original Message-----
> From: Luke Whitaker [mailto:luke at inpharmatica.co.uk] 
> Sent: Thursday, April 17, 2003 12:03 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Measure of linearity between two variables?
> 
> 
> 
> Hello,
> 
> I am looking for a measure of linearity in the relationship between two
> variables.
> 
> Specifically, I have two variables for which the relationship is reasonably
> linear over a certain range of values, and then diverges from linearity at
> either end of the range, as one or other variable "saturates" at a maximum
> or minimum value. I want to identify the region of linearity, where neither
> variable has saturated.
> 
> This is a problem that will be repeated many times so I want a programmatic
> solution.  I am intending to implement some kind of search over the central
> range of values, expanding out and testing for linearity over each
> incrementally increased range. However, I need a measure if linearity.
> 
> So far, I have thought of doing a regression on x ~ y + y^2, and using the
> absolute value of the ratio of coefficients of the squared and linear terms.
> Does anyone have any better ideas, either for a linearity measure or a
> different approach to finding the region of linearity between the two
> variables ?
> 
> Thanks,
> 
> Luke Whitaker
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help