[R] Regression line limited by the range of values

Wed May 24 22:32:11 CEST 2006

On Wed, 2006-05-24 at 21:53 +0200, Andreas Svensson wrote:
> Thankyou very much Marc for that nifty little script.
> 
> When I use it on my real dataset though, the lines are fat in the middle 
> and thinner towards the ends. I guess it's because "lines" draw one 
> fitted line for each x, and if you have hundreds of x, this turns into a 
> line that is thicker that it should be (due to rounding errors perhaps).
> 
> I got the tip to use "segments", and draw one line from min(x),min(y) to 
> max(x),max(y) but with real data with a bunch of "na.rm" and 
> "na.action"s this becomes very long and bulky.
> 
> For a regression with two data swarms and two trend lines it becomes 
> this long mess:
> 
> <>plot(TCgonad[Period=="1"], 
> ABelly[Period=="1"],xlim=c(0,20),ylim=c(130,160), col=”blue”)
> points(TCgonad[Period=="2"], ABelly[Period=="2"],col=”red”)
> 
> segments(
> min(TCgonad[Period=="1"],na.rm=T), 
> min(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"],
> na.action=na.exclude)),na.rm=T),
> max(TCgonad[Period=="1"],na.rm=T), 
> max(fitted(lm(ABelly[Period=="1"]~TCgonad[Period=="1"],
> na.action=na.exclude)),na.rm=T),col=”blue”)
> 
> <>segments(
> min(TCgonad[Period=="2"],na.rm=T), 
> min(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"],
> na.action=na.exclude)),na.rm=T),
> max(TCgonad[Period=="2"],na.rm=T), 
> max(fitted(lm(ABelly[Period=="2"]~TCgonad[Period=="2"],
> na.action=na.exclude)),na.rm=T),col=”red”)
> 
> 
> I just think it's strange that abline has as a nonadjustable default to 
> extrapolate the line to outside the data - a mortal sin in my field.
> 
> Cheers
> Andreas

Andreas,

What is likely happening is that your x values are not sorted in
increasing order as they were in the dummy data set, thus there is
probably some movement of the lines back and forth from one x value to
the next in the order that they appear in the data set.

Here is a more generic approach that should work. It is based upon the
examples in ?predict.lm. We create the two models and then use range(x?)
for the new min,max x values to be used for the prediction of the fitted
y values.

Thus:

# Create model1 using random data
set.seed(1)
x1 <- rnorm(15)
y1 <- x1 + rnorm(15)
model1 <- lm(y1 ~ x1)

# Create model2 using random data
set.seed(2)
x2 <- rnorm(15)
y2 <- x2 + rnorm(15)
model2 <- lm(y2 ~ x2)

# Plot the first set of points
# set the axis ranges based upon the common ranges of the 
# two sets of data
plot(x1, y1, col = "red", xlim = range(x1, x2),
     ylim = range(y1, y2))

# Add the second set of points
points(x2, y2, col = "blue")

# Create the first fitted line
# Create two x values for the prediction based upon the 
# range of x1
lines(range(x1), predict(model1, data.frame(x1 = range(x1))), 
      col = "red")

# Create the second fitted line
# Same here for the x2 values
lines(range(x2), predict(model2, data.frame(x2 = range(x2))), 
      col = "blue")

Note that in both cases for the fitted lines, the lines will be
constrained by the range of the actual x? data values.

This should be easier to apply in general.

HTH,

Marc Schwartz