[R] piecewise regression and lm() question

Duncan Murdoch murdoch.duncan at gmail.com
Wed Feb 11 23:13:14 CET 2015


On 11/02/2015 4:30 PM, Goldschneider, Jill wrote:
> I was playing with some examples of piecewise regression using lm() and have come across a behavior I'm uncertain about.
> Below is simple 3-segment dataset.  I compare predicted output of a model created by one call to lm() to that of 3 models created by 3 calls to lm().
> In case A and B, the results are the same.  However, in case C the results differ for the middle segment.
> Is the output of lm() for case C to be expected or not and if so, why?

Take a look at the fit value, and you'll see what happened:

> lm(y ~ (x<=10)*x + I(x>10 & x<=20) + (x>20)*x)

Call:
lm(formula = y ~ (x <= 10) * x + I(x > 10 & x <= 20) + (x > 20) *
    x)

Coefficients:
            (Intercept)              x <= 10TRUE
x  I(x > 10 & x <= 20)TRUE
                35.0741                 -15.0733
-0.1644                 -17.7344
             x > 20TRUE            x <= 10TRUE:x             x:x > 20TRUE
                     NA                   1.2397                  -0.9658

The * in a formula means "main effect plus interaction", not
multiplication.  W	hat you want is


lm(y ~ I((x<=10)*x) + I(x>10 & x<=20) + I((x>20)*x))

This doesn't give exactly the same results as the segmented regression,
because it uses the same intercept in all three segments; you might want
to add I(x <= 10) as well if you don't want that.

Duncan Murdoch

> Thank you,
> Jill
> 
> set.seed(133)
> y <- c(21:30, rep(15,10), 10:1) + runif(30, -2, 2)
> x <- 1:30
> 
> # Case A: 3 segments, each fit with a constant value
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~1)), predict(lm(y[11:20]~1)), predict(lm(y[21:30]~1)))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ I(x<=10) + I(x>10 & x<=20) + I(x>20)))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
> 
> # Case B: 3 segments, each fit with a line
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~x[11:20])), predict(lm(y[21:30]~x[21:30])))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ (x<=10)*x + (x>10 & x<=20)*x + (x>20)*x))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
> 
> # Cast C: 3 segments - the middle fit with a constant value, the outer by a line
> plot(x, y)
> # 3 different lm fits
> p.c3 <- c(predict(lm(y[1:10]~x[1:10])), predict(lm(y[11:20]~1)), predict(lm(y[21:30]~x[21:30])))
> lines(x, p.c3, col="red")
> # piecewise via lm
> p.lm1 <- predict(lm(y ~ (x<=10)*x + I(x>10 & x<=20) + (x>20)*x))
> lines(x, p.lm1, col="blue")
> max(abs(p.c3-p.lm1))
> ## the single call to lm does not have a constant value fit to the middle section
> plot(x, p.c3-p.lm1 )
> 
> 
> 
> The information contained in this transmission may contain confidential information.  If the reader of this message is not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited.  If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list