[R] Question on broken-line regression: 'segmented' or alternative

Benedikt Drosse drosse at mpipz.mpg.de
Sat Jan 12 11:13:25 CET 2013

Dear R-Users,

I have a question concerning the determination of breakpoints and 
comparison of slopes from broken-line regression models. Although this 
is rather a standard problem in data analysis, all information I 
gathered so far, did not answer my questions.

I added a subsetted example of my data. Basically it is a timeseries of 
recorded phenotypes in three different groups of plants.
You will see in the plot that the phenotype between groups behaves 
differently over time. Plotted data of Group 3 look like a nice linear 
relationship between phenotype (y) and time (x). Group 1 seems to have 1 
breakpoint (at x=31) and Group 3 has 2 breakpoints (at x=c(14,31)) in 
the data.

1. I am very interested in finding the breakpoints (x-values), where the 
slope of a regression line changes. In the plot I estimated the 
break-points by eye, subsetted the data and performed linear regression 
on the subsets for each group. However, this is not the way I want to 
go, since I would like to have some reasonable criteria (better than my 
eye) to determine breakpoints.

2. Additionally I am interested in testing, whether the slopes of 
(partial-)regression lines between groups are different from each other.

Considering my questions, the 'segmented' package should exactly fit my 
needs. The problem I have is, that on the one hand I would also like to 
compare slopes of regression lines, which do not have breakpoints (e.g 
group 3 of the example) to lines, which have breakpoints (group 1 and 
2). This seems to be not possible with the 'segmented' package.
On the other hand the 'segmented' package seems to have a poor 
error-reporting, so it is very hard for me to figure out, what is going 
wrong in the analysis (see example).

Can you think of any other way/procedure/package I could use to analyze 
my data? Hopefully you can also give me a clue on how to proceed with 
the analysis, when the error of the segmented()-function in the creation 
of the covariance-matrix occurs.

I would be grateful for any comment on how to determine breakpoints and 
compare slopes of regression lines in a different way than subsetting 
the data by hand and regress the subsetted parts as I did it for the 
example plot. Thanks to anyone, who has an idea, already. Sorry for the 
long example code.

Best regards,

#Start of Code

#Data file:
     x <- 

   #y-data for three groups (t1, t2, t3):
     t1 <- 
     t2 <- 
     t3 <- 

     group <- as.factor(c(rep(1, times=length(t1)), rep(2, 
times=length(t2)), rep(3, times=length(t3))))

     data <- data.frame(group, x=x, y=c(t1,t2,t3))

#Groupwise plot of datapoints vs time:
   scatterplot(y~x*group, data=data, smooth=FALSE, reg.line=FALSE, 

#Regressions for group 1 with 1 breakpoints (red, dashed line)
   reg_1.1 <- lm(data[which(data$x<=30 & 
data$group==1),3]~data[which(data$x<=30 & data$group==1),2])
     abline(reg_1.1, col="black")

     reg_1.2 <- lm(data[which(data$x>=30 & 
data$group==1),3]~data[which(data$x>=30 & data$group==1),2])
       abline(reg_1.2, col="black")

#Regressions for group 2 with 2 breakpoints (black, solid line)
   reg_2.1 <- lm(data[which(data$x<=14 & 
data$group==2),3]~data[which(data$x<=14 & data$group==2),2])
     abline(reg_2.1, col="red", lty=2)

     reg_2.2 <- lm(data[which(data$x>=14 & data$x<45 & 
data$group==2),3]~data[which(data$x>=14 & data$x<45 & data$group==2),2])
       abline(reg_2.2, col="red", lty=2)

       reg_2.3 <- lm(data[which(data$x>=31 & data$x<=45 & 
data$group==2),3]~data[which(data$x>=31 & data$x<=45 & data$group==2),2])
       abline(reg_2.3, col="red", lty=2)

#Regression for group 3 without breakpoints (green, solid line)
   reg_3.1 <- lm(data[which(data$group==3),3]~data[which(data$group==3),2])
     abline(reg_3.1, col="green", lty=1)

#Segmented package to estimate breakpoints and determine slopes for 
group2 and 3:
   subset_data <- subset(data, subset=data$group!=3)
     names(subset_data) <- c("sub_group", "sub_x", "sub_y")

   X <- model.matrix(~0+sub_group)*sub_x

   time.1 <- X[,1]
   time.2 <- X[,2]
   time.3 <- X[,3]

   olm <- lm(sub_y~0 + sub_group + time.1 + time.2)
   os <- segmented(olm, seg.Z= ~ time.1 + time.2, 
psi=list(time.1=c(14,30), time.2=40))

#End of code

More information about the R-help mailing list