[Rd] Update: Is there an implementation of loess with more than 3 parametric predictors or a trick to a similar effect?

Thu Jun 16 17:45:48 CEST 2011

Dear R developers!

Considering I got no response or comments in the general r-help forum
so far, perhaps my question is actually better suited for this list? I
have added some more hopefully relevant technical details to my
original post (edited below).

Any comments gratefully received!

Best regards,
David Kreil.

----------

Dear R experts,

I have a problem that is a related to the question raised in this earlier post
    https://stat.ethz.ch/pipermail/r-help/2007-January/124064.html

My situation is different in that I have only 2 predictors
(coordinates x,y) for local regression but a number of global
("parametric") offsets that I need to consider.

Essentially, we have a spatial distortion s(x,y) overlaid over a
number of measurements z:
    z_i = s(x_i,y_i) + v_{g_i}
These measurements z can be grouped by the same underlying undistorted
measurement value v for each group g. The group membership g_i is
known for each measurement, but the underlying undistorted measurement
values v_g for the groups are not known and should be determined by
(global, not local) regression.

We need to estimate the two-dimensional spatial trend s(x,y), which we
then want to remove. In our application, say there are 20 groups of at
least 35 measurements each, in the most simple scenario. The
measurements are randomly placed. Taking the first group as reference,
there are thus 19 unknown offsets.

The below code for toy data (spatial trend in one dimension x) works
for two or three offset groups.

Unfortunately, the loess call fails for four or more offset groups
with the error message
"Error in simpleLoess(y, x, w, span, degree, parametric, drop.square,
normalize,  :
  only 1-4 predictors are allowed"
See comment #1

I tried overriding the restriction (see lower section of code example) and got
"k>d2MAX in ehg136.  Need to recompile with increased dimensions."

How easy would that be to do? I cannot find a definition of d2MAX
anywhere, and it seems this might be hardcoded -- the error is
apparently triggered by line #1359 in loessf.f
      if(k .gt. 15)   call ehg182(105)

Alternatively, does anyone know of an implementation of local
regression with global (parametric) offset groups that could be
applied here?

Or is there a better way of dealing with this? I tried lme with
correlation structures but that seems to be much, much slower.

Any comments would be greatly appreciated!

Best regards,
David Kreil.

###
#
# loess with parametric offsets - toy data demo
#

x<-seq(0,9,.1);
x.N<-length(x);

# Comment #1:
# These are the offsets ("unknown", to be revovered by the regression):
o<-c(0.4,-0.8,1.2);  # three parametric variables --> total of four, works
o<-c(0.4,-0.8,1.2,-0.2);  # four par.variables --> total of 5 > 4 --> error:
# "... only 1-4 predictors are allowed"

o.N<-length(o);
f<-sapply(seq(o.N),
          function(n){
            ifelse((seq(x.N)<= n   *x.N/(o.N+1) &
                    seq(x.N)> (n-1)*x.N/(o.N+1)),
                    1,0);
          });
f<-f[sample(NROW(f)),];

y<-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs<-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')});
s<-paste(c('y~x',s.fs),collapse='+');
d<-data.frame(x,y,f)
names(d)<-c('x','y',s.fs);

l<-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,
         span=0.4);
yp<-predict(l,newdata=d);
plot(x,y,pch='+',ylim=c(-3,3),col='red');  # input data
points(x,yp,pch='o',col='blue');           # fit of that

d0<-d; d0$f1<-d0$f2<-d0$f3<-0;
yp0<-predict(l,newdata=d0);
points(x,y-f%*%o);     # spatial distortion
lines(x,yp0,pch='+');  # estimate of that

op<-sapply(seq(NCOL(f)),function(i){(yp-yp0)[!!f[,i]][1]});

cat("Demo offsets:",o,"\n");
cat("Estimated offsets:",format(op,digits=1),"\n");

#
# to test if the underlying implementation has a limit of 4 (it does):
#

# copied from loess.R, and modified as indicated by ###***###
simpleLoess <-
  function(y, x, weights, span = 0.75, degree = 2, parametric = FALSE,
	   drop.square = FALSE, normalize = TRUE,
	   statistics = "approximate", surface = "interpolate",
	   cell = 0.2, iterations = 1, trace.hat = "exact")
{
    ## loess_ translated to R.
    D <- NCOL(x)
###***###    if(D > 4) stop("only 1-4 predictors are allowed")
    N <- NROW(x)
    if(!N || !D)	stop("invalid 'x'")
    if(!length(y))	stop("invalid 'y'")
    x <- as.matrix(x)
    max.kd <-  max(N, 200)
    robust <- rep(1, N)
    divisor<- rep(1, D)
    if(normalize && D > 1L) {
	trim <- ceiling(0.1 * N)
	divisor <-
	    sqrt(apply(apply(x, 2L, sort)[seq(trim+1, N-trim), , drop = FALSE],
		       2L, var))
	x <- x/rep(divisor, rep(N, D))
    }
    sum.drop.sqr <- sum(drop.square)
    sum.parametric <- sum(parametric)
    nonparametric <- sum(!parametric)
    order.parametric <- order(parametric)
    x <- x[, order.parametric]
    order.drop.sqr <- (2 - drop.square)[order.parametric]
    if(degree==1 && sum.drop.sqr)
	stop("specified the square of a factor predictor to be dropped when
degree = 1")
    if(D == 1 && sum.drop.sqr)
	stop("specified the square of a predictor to be dropped with only one
numeric predictor")
    if(sum.parametric == D) stop("specified parametric for all predictors")
    if(iterations)
    for(j in 1L:iterations) {
	robust <- weights * robust
	if(j > 1) statistics <- "none"
	else if(surface == "interpolate" && statistics == "approximate")
	    statistics <- if(trace.hat == "exact") "1.approx"
            else "2.approx" # trace.hat == "approximate"
	surf.stat <- paste(surface, statistics, sep="/")
	z <- .C(R_loess_raw, # ../src/loessc.c
		as.double(y),
		as.double(x),
		as.double(weights),
		as.double(robust),
		as.integer(D),
		as.integer(N),
		as.double(span),
		as.integer(degree),
		as.integer(nonparametric),
		as.integer(order.drop.sqr),
		as.integer(sum.drop.sqr),
		as.double(span*cell),
		as.character(surf.stat),
		fitted.values = double(N),
		parameter = integer(7),
		a = integer(max.kd),
		xi = double(max.kd),
		vert = double(2*D),
		vval = double((D+1)*max.kd),
		diagonal = double(N),
		trL = double(1),
		delta1 = double(1),
		delta2 = double(1),
		as.integer(surf.stat == "interpolate/exact"))
	if(j==1) {
	    trace.hat.out <- z$trL
	    one.delta <- z$delta1
	    two.delta <- z$delta2
	}
	fitted.residuals <- y - z$fitted.values
	if(j < iterations)
	    robust <- .Fortran(R_lowesw,
			       as.double(fitted.residuals),
			       as.integer(N),
			       robust = double(N),
			       integer(N))$robust
    }
    if(surface == "interpolate")
    {
	pars <- z$parameter
	names(pars) <- c("d", "n", "vc", "nc", "nv", "liv", "lv")
	enough <- (D + 1) * pars["nv"]
	fit.kd <- list(parameter=pars, a=z$a[1L:pars[4L]], xi=z$xi[1L:pars[4L]],
		       vert=z$vert, vval=z$vval[1L:enough])
    }
    if(iterations > 1) {
	pseudovalues <- .Fortran(R_lowesp,
				 as.integer(N),
				 as.double(y),
				 as.double(z$fitted.values),
				 as.double(weights),
				 as.double(robust),
				 integer(N),
				 pseudovalues = double(N))$pseudovalues
	zz <- .C(R_loess_raw,
		as.double(pseudovalues),
		as.double(x),
		as.double(weights),
		as.double(weights),
		as.integer(D),
		as.integer(N),
		as.double(span),
		as.integer(degree),
		as.integer(nonparametric),
		as.integer(order.drop.sqr),
		as.integer(sum.drop.sqr),
		as.double(span*cell),
		as.character(surf.stat),
		temp = double(N),
		parameter = integer(7),
		a = integer(max.kd),
		xi = double(max.kd),
		vert = double(2*D),
		vval = double((D+1)*max.kd),
		diagonal = double(N),
		trL = double(1),
		delta1 = double(1),
		delta2 = double(1),
		as.integer(0L))
	pseudo.resid <- pseudovalues - zz$temp
    }
    sum.squares <- if(iterations <= 1) sum(weights * fitted.residuals^2)
    else sum(weights * pseudo.resid^2)
    enp <- one.delta + 2*trace.hat.out - N
    s <- sqrt(sum.squares/one.delta)
    pars <- list(robust=robust, span=span, degree=degree, normalize=normalize,
		 parametric=parametric, drop.square=drop.square,
		 surface=surface, cell=cell, family=
		 if(iterations <= 1) "gaussian" else "symmetric",
		 iterations=iterations)
    fit <- list(n=N, fitted=z$fitted.values, residuals=fitted.residuals,
		enp=enp, s=s, one.delta=one.delta, two.delta=two.delta,
		trace.hat=trace.hat.out, divisor=divisor)
    fit$pars <- pars
    if(surface == "interpolate") fit$kd <- fit.kd
    class(fit) <- "loess"
    fit
}
environment(simpleLoess)<-environment(loess);  # to make it find private funcs
environment(loess)<-.GlobalEnv;         # to make it use our simpleLoess!
environment(loess.smooth)<-.GlobalEnv;  # to make it use our simpleLoess!

# now run the demo with four parametric variables --> total 5.> 4
triggers error:
# k>d2MAX in ehg136.  Need to recompile with increased dimensions.
# but there is no obvious d2MAX to configure that I can find...