[R] Question on broken-line regression: 'segmented' or alternative
dwinsemius at comcast.net
Sat Jan 12 22:06:43 CET 2013
On Jan 12, 2010, at 2:13 AM, Benedikt Drosse wrote:
> Dear R-Users,
> I have a question concerning the determination of breakpoints and
> comparison of slopes from broken-line regression models. Although
> this is rather a standard problem in data analysis, all information
> I gathered so far, did not answer my questions.
> I added a subsetted example of my data. Basically it is a timeseries
> of recorded phenotypes in three different groups of plants.
> You will see in the plot that the phenotype between groups behaves
> differently over time. Plotted data of Group 3 look like a nice
> linear relationship between phenotype (y) and time (x). Group 1
> seems to have 1 breakpoint (at x=31) and Group 3 has 2 breakpoints
> (at x=c(14,31)) in the data.
> 1. I am very interested in finding the breakpoints (x-values), where
> the slope of a regression line changes. In the plot I estimated the
> break-points by eye, subsetted the data and performed linear
> regression on the subsets for each group. However, this is not the
> way I want to go, since I would like to have some reasonable
> criteria (better than my eye) to determine breakpoints.
> 2. Additionally I am interested in testing, whether the slopes of
> (partial-)regression lines between groups are different from each
> Considering my questions, the 'segmented' package should exactly fit
> my needs. The problem I have is, that on the one hand I would also
> like to compare slopes of regression lines, which do not have
> breakpoints (e.g group 3 of the example) to lines, which have
> breakpoints (group 1 and 2). This seems to be not possible with the
> 'segmented' package.
> On the other hand the 'segmented' package seems to have a poor error-
> reporting, so it is very hard for me to figure out, what is going
> wrong in the analysis (see example).
> Can you think of any other way/procedure/package I could use to
> analyze my data? Hopefully you can also give me a clue on how to
> proceed with the analysis, when the error of the segmented()-
> function in the creation of the covariance-matrix occurs.
> I would be grateful for any comment on how to determine breakpoints
> and compare slopes of regression lines in a different way than
> subsetting the data by hand and regress the subsetted parts as I did
> it for the example plot. Thanks to anyone, who has an idea, already.
> Sorry for the long example code.
Have you looked at the capabilities of the 'strucchange' package?
> Best regards,
> #Start of Code
> #Data file:
> x <-
> #y-data for three groups (t1, t2, t3):
> t1 <-
> t2 <-
> t3 <-
> group <- as.factor(c(rep(1, times=length(t1)), rep(2,
> times=length(t2)), rep(3, times=length(t3))))
> data <- data.frame(group, x=x, y=c(t1,t2,t3))
> #Groupwise plot of datapoints vs time:
> scatterplot(y~x*group, data=data, smooth=FALSE, reg.line=FALSE,
> #Regressions for group 1 with 1 breakpoints (red, dashed line)
> reg_1.1 <- lm(data[which(data$x<=30 & data$group==1),
> 3]~data[which(data$x<=30 & data$group==1),2])
> abline(reg_1.1, col="black")
> reg_1.2 <- lm(data[which(data$x>=30 & data$group==1),
> 3]~data[which(data$x>=30 & data$group==1),2])
> abline(reg_1.2, col="black")
> #Regressions for group 2 with 2 breakpoints (black, solid line)
> reg_2.1 <- lm(data[which(data$x<=14 & data$group==2),
> 3]~data[which(data$x<=14 & data$group==2),2])
> abline(reg_2.1, col="red", lty=2)
> reg_2.2 <- lm(data[which(data$x>=14 & data$x<45 & data$group==2),
> 3]~data[which(data$x>=14 & data$x<45 & data$group==2),2])
> abline(reg_2.2, col="red", lty=2)
> reg_2.3 <- lm(data[which(data$x>=31 & data$x<=45 & data
> $group==2),3]~data[which(data$x>=31 & data$x<=45 & data$group==2),2])
> abline(reg_2.3, col="red", lty=2)
> #Regression for group 3 without breakpoints (green, solid line)
> reg_3.1 <- lm(data[which(data$group==3),3]~data[which(data
> abline(reg_3.1, col="green", lty=1)
> #Segmented package to estimate breakpoints and determine slopes for
> group2 and 3:
> subset_data <- subset(data, subset=data$group!=3)
> names(subset_data) <- c("sub_group", "sub_x", "sub_y")
> X <- model.matrix(~0+sub_group)*sub_x
> time.1 <- X[,1]
> time.2 <- X[,2]
> time.3 <- X[,3]
> olm <- lm(sub_y~0 + sub_group + time.1 + time.2)
> os <- segmented(olm, seg.Z= ~ time.1 + time.2, psi=list(time.
> 1=c(14,30), time.2=40))
> #End of code
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Alameda, CA, USA
More information about the R-help