[R] LOCFIT: What's it doing?

Jacho-Chavez,DT (pgr) D.T.Jacho-Chavez at lse.ac.uk
Thu Apr 14 10:47:19 CEST 2005


Dear R-users,

One of the main reasons I moved from GAUSS to R (as an econometrician) was because of the existence of the library LOCFIT for local polynomial regression. While doing some checking between my former `GAUSS code' and my new `R code', I came to realize LOCFIT is not quite doing what I want. I wrote the following example script:

#-----------------------------------------------------------------------------------------------------------------
# Plain Vanilla NADARAYA-WATSON estimator (or Local Constant regression, e.g. deg=0)
# with gaussian kernel & fixed bandwidth

mkern<-function(y,x,h){
Mx <- matrix(x,nrow=length(y),ncol=length(y),byrow=TRUE)
Mxh <- (1/h)*dnorm((x-Mx)/h)
Myxh<- (1/h)*y*dnorm((x-Mx)/h)
yh <- rowMeans(Myxh)/rowMeans(Mxh)
return(yh)
}

# Generating the design Y=m(x)+e
n <- 10
h <- 0.5
x <- rnorm(n)
y <- x + rnorm(n,mean=0,sd=0.5)

# This is what I really want!
mhat <- mkern(y,x,h)

library(locfit)
yhl.raw <- locfit(y~x,alpha=c(0,h),kern="gauss",ev="data",deg=0,link="ident")

# This is what I get with LOCFIT
print(cbind(x,mhat,residuals(yhl.raw,type="fit"),knots(yhl.raw,what="coef")))
#--------------------------------------------------------------------------------------------------------------------

Questions:
1) Why are residuals(.) & knots(.) results different from one another? If I want m^(x[i]) at each evaluation point i=1,...,n, which one should I use? I do not want interpolation whatsoever.
2) Why are they `close' but not equal to what I want?

I can accept differences for higher degrees and multidimensional data at the boundary of the support (given the way we must do the regression in areas with sparse data) But why are these difference present for deg=0 inside the support as well as at the boundary? The computer would still give us a result even with a close-to-zero random denominator (admittedly, not a reliable one). Unfortunately, I cannot get access to a copy of "Loader, C. (1999) Local Regression and Likelihood, Springer" from my local library, so a small explanation or advice would be greatly appreciated.

I do not mind using an improved version of `what I want', but I would like to understand what am I doing?


Thanks in advanced for your help,


David Jacho-Chávez




More information about the R-help mailing list