[R] How to improve the robustness of "loess"? - example included.

Emmanuel Levy emmanuel.levy at gmail.com
Sun Mar 11 02:02:55 CET 2012


Ok so this seems to work :)


    tmp=rnorm(2000)
    X.background = 5+tmp
    Y.background = 5+ (10*tmp+rnorm(2000))
    X.specific = 3.5+3*runif(3000)
    Y.specific = 5+120*runif(3000)

    X = c(X.background, X.specific)
    Y = c(Y.background, Y.specific)

    MINx=range(X)[1]
    MAXx=range(X)[2]
    MINy=range(Y)[1]
    MAXy=range(Y)[2]

  ## estimates the density for each datapoint
    nBins=50
    my.lims= c(range(X,na.rm=TRUE),range(Y,na.rm=TRUE))

    z1 = kde2d(X,Y,n=nBins, lims=my.lims, h= c(
(my.lims[2]-my.lims[1])/(nBins/4) ,  (my.lims[4]-my.lims[3])/(nBins/4)
    )     )
    X.cut = cut(X, seq(z1$x[1], z1$x[nBins],len=(nBins+1) ))
    Y.cut = cut(Y, seq(z1$y[1], z1$y[nBins],len=(nBins+1) ))
    xy.cuts = data.frame(X.cut,Y.cut, ord=1:(length(X.cut)) )
    density = data.frame( X=rep(factor(levels(X.cut)),rep(nBins) ),
Y=rep(factor(levels(Y.cut)), rep(nBins,nBins) ) , Z= as.vector(z1$z))

    xy.density = merge( xy.cuts, density, by=c(1,2), sort=FALSE, all.x=TRUE)
    xy.density = xy.density[order(x=xy.density$ord),]

    ### Now uses the density as a weight
    my.loess = loess(Y ~ X, data.frame( X = X, Y = Y),
family="symmetric", degree=2, span=0.1, weights= xy.density$Z^3)
    lo.pred = predict(my.loess, data.frame(X = seq(MINx, MAXx,
length=100)), se=TRUE)
    plot( seq(MINx, MAXx, length=100), lo.pred$fit, lwd=2,col=2, "l")
#, ylim=c(0, max(tmp$fit, na.rm=TRUE) ) , col="dark grey")
    points(X,Y, pch=".", col= grey(abs(my.loess$res)/max(abs(my.loess$res))) )


On 10 March 2012 18:30, Emmanuel Levy <emmanuel.levy at gmail.com> wrote:
> Hi,
>
> I posted a message earlier entitled "How to fit a line through the
> "Mountain crest" ..."
>
> I figured loess is probably the best way, but it seems that the
> problem is the robustness of the fit. Below I paste an example to
> illustrate the problem:
>
>    tmp=rnorm(2000)
>    X.background = 5+tmp; Y.background = 5+ (10*tmp+rnorm(2000))
>    X.specific = 3.5+3*runif(1000); Y.specific = 5+120*runif(1000)
>    X = c(X.background, X.specific);Y = c(Y.background, Y.specific)
>    MINx=range(X)[1];MAXx=range(X)[2]
>
>    my.loess = loess(Y ~ X, data.frame( X = X, Y = Y),
> family="symmetric", degree=2, span=0.1)
>    lo.pred = predict(my.loess, data.frame(X = seq(MINx, MAXx,
> length=100)), se=TRUE)
>    plot( seq(MINx, MAXx, length=100), lo.pred$fit, lwd=2,col=2, "l")
>    points(X,Y, col= grey(abs(my.loess$res)/max(abs(my.loess$res))) )
>
> As you will see, the red line does not follow the "background" signal.
> However, when decreasing the "specific" signal to 500 points it
> becomes perfect.
>
> I'm sure there is a way to "tune" the fitting so that it works but I
> can't figure out how. Importantly, *I cannot increase the span*
> because in reality the relationship I'm looking at is more complex so
> I need a small  span value to allow for a close fit.
>
> I foresee that changing the "weigthing" is the way to go but I do not
> really understand how the "weight" option is used (I tried to change
> it and nothing happened), and also the embedded "tricubic weighting"
> does not seem changeable.
>
> So any idea would be very welcome.
>
> Emmanuel



More information about the R-help mailing list