[R] Fitting a linear model with a break point

Daniel Malter daniel at umd.edu
Wed Sep 9 02:52:44 CEST 2009


First, dependent on how highly dimensional your data is, I would recommend
inspecting it visually first. Does the step-function seem a reasonable
assumption? 

If I understand you correctly when you say, "piecewise linear which has the
gradient constrained to zero," you mean dummy-variable coding for
x>threshold. If you just have 5 values of x, I would do it by hand. However,
you can also do it in a loop (see example below). Finally, there are tests
that test the quality of non-nested model, e.g. the J-test.

#random threshold
threshold=round(runif(1,min=0,max=4))

#simulate x
x=round(runif(1000,min=0,max=4))

#simulate error
e=rnorm(1000,0,1)

#create y that jumps at the threshold
y=2*(x>threshold)+e

#inspect
plot(y~x)

#Run regressions with all possible threshold levels x>threshold
r.squared=NULL
tested.threshold=NULL
for(i in 1:(length(unique(x))-1)){
  r.squared[i]=summary(lm(y~(x>(i-1))))$r.squared
  tested.threshold[i]=i-1
  }

#inspect r-squareds
print(data.frame(tested.threshold,r.squared))

#Should indicate the highest r-squared
#at the appropriate threshold level

HTH,
Daniel
-------------------------
cuncta stricte discussurus
-------------------------

-----Ursprüngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Daniel Brewer
Gesendet: Tuesday, September 08, 2009 10:16 AM
An: r-help at stat.math.ethz.ch
Betreff: [R] Fitting a linear model with a break point

Hello,

I would like to test some data to see whether it has the shape of a step
function (i.e. y1 up until x_th and then y2 where x_th is the threshold).
The threshold x_th is unknown and the x values can only take discrete values
(0,1,2,3,4).

An example would be:
data<- data.frame(x=1:20,y=c(rnorm(10),rnorm(10,10)))


I was thinking along the lines of fitting some sort of piiecewise linear
model which has the gradient constrained to zero trying out all possible
different threshold and taking the one with the least residuals.  I am not
sure how to implement this in R.  Anyone got any ideas?

Also is there a way of including the threshold in the actual model, so that
could be estimated too?

Thanks

Dan

--
**************************************************************
Daniel Brewer, Ph.D.

Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk
**************************************************************

The Institute of Cancer Research: Royal Cancer Hospital, a charitable
Company Limited by Guarantee, Registered in England under Company No. 534147
with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:9}}




More information about the R-help mailing list