[R] NA's in segmented

vito muggeo vmuggeo at dssm.unipa.it
Mon Oct 6 14:05:41 CEST 2008


Dear Tyler,
Yes the problem is with NA..
There are two solutions:

1) You can use lm() + segmented (you fit a gaussian model, so why do you 
use glm()?)

2)If you want to use glm()+ segmented(), use na.omit() to pass your 
dataframe to the data argument of glm, glm(.., data=na.omit())


Also, if you want to constrain the right slope to be zero use "the minus 
variable" (see the relevant recent paper on Rnews)

hope this helps you

vito

###############################################
Example

x<-1:50/50
y<- -2*x+2*pmax(x-.6,0)+rnorm(50)*.1

x[20:22]<-NA
d<-data.frame(xx=x,yy=y)
rm(x,y)

#Use lm() - It works:
o<-lm(yy~xx, data=d, na.action=na.omit)
os<-segmented(o,seg.Z=~xx,psi=.5)

#Use glm() - It works:
o<-glm(yy~xx, data=na.omit(d))
os<-segmented(o,seg.Z=~xx,psi=.5)

#constrain the right slope to zero
d$xxx<- -d$x
o<-glm(yy~1, data=na.omit(d))
os1<-segmented(o,seg.Z=~xxx,psi=-.5)

with(d,plot(xx,yy)
plot(os, add=TRUE)
plot(os1, add=TRUE, col=2, rev.sgn=TRUE)



T.D.Rudolph ha scritto:
> I am trying to fit a very simple broken stick model using the package
> "segmented" but I have hit a roadblock. 
> 
>> str(data)
> 'data.frame':   18 obs. of  2 variables:
>  $ Bin   : num  0.25 0.75 1.25 1.75 2.25 2.75 3.25 3.75 4.25 4.75 ...
>  $ LnFREQ: num  5.06 4.23 3.50 3.47 2.83 ...
>  
> I fit the lm easily:
>> fit.lm<-lm(LnFREQ~Bin, data=id07)
> 
> But I keep getting an error message:
>> fit.seg<-segmented(fit.glm, seg.Z=~Bin, psi=6)
> Error in cbind(XREG, U, V) : 
>   number of rows of matrices must match (see arg 2)
>  
> I think the problem is due to NA's in the Bin data, but there doesn't seem
> to be an "na.action" for segmented ().  What is the best way to get around
> this problem?  I would like to keep all Bin values in the output for
> continuity.  Data below....
>  
> Tyler
> 
>> data
>     Bin    LnFREQ
> 1  0.25 5.0562458
> 2  0.75 4.2341065
> 3  1.25 3.4965076
> 4  1.75 3.4657359
> 5  2.25 2.8332133
> 6  2.75 2.9444390
> 7  3.25 2.4849066
> 8  3.75 1.3862944
> 9  4.25 1.7917595
> 10 4.75 1.3862944
> 11 5.25 0.6931472
> 12 5.75 1.0986123
> 13 6.25 0.0000000
> 14 6.75 0.0000000
> 15 7.25        NA
> 16 7.75 0.0000000
> 17 8.25 0.0000000
> 18 8.75        NA
>  
>> summary(fit.glm)
> Call:
> glm(formula = LnFREQ ~ Bin, data = id07, na.action = na.omit)
> Deviance Residuals: 
>      Min        1Q    Median        3Q       Max  
> -0.74139  -0.22999   0.01065   0.21245   0.72684  
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)    
> (Intercept)  4.50646    0.21088   21.37 4.37e-12 ***
> Bin         -0.63434    0.04467  -14.20 1.05e-09 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
> (Dispersion parameter for gaussian family taken to be 0.1844898)
>     Null deviance: 39.7785  on 15  degrees of freedom
> Residual deviance:  2.5829  on 14  degrees of freedom
>   (2 observations deleted due to missingness)
> AIC: 22.227
> Number of Fisher Scoring iterations: 2

-- 
====================================
Vito M.R. Muggeo
Dip.to Sc Statist e Matem `Vianelli'
Università di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 6626240
fax: 091 485726/485612
http://dssm.unipa.it/vmuggeo



More information about the R-help mailing list