[R] weighted kernel density estimation

Romain François francoisromain at free.fr
Wed Dec 22 14:57:31 CET 2004


Dear wizaRds,

I use the MASS::kde2d function to estimate density of the two first 
principal components. I do that to have a graphic visualisation of a 
"group structure" in my dataset. So far, no problem.

But i would like to estimate that density using weights according to the 
COS² values that tells me if my observation is well represented on the 
factorial plan 1-2. I would like to use (1) instead of (2) where the w_i 
depends on the COS², i hacked the MASS::kde2d function to do that.

\hat{f}_n(x) = \frac{1}{n|H|} \sum_{i=1}^n w_i \times K\left[H^{-1}(x-X_i)\right]          (1)

\hat{f}_n(x) = \frac{1}{n|H|} \sum_{i=1}^n K\left[H^{-1}(x-X_i)\right]                     (2)


I have uploaded to http://addictedtor.free.fr/testkde2dw the files :
- plan12.pdf             : my factorial plan 1-2 (size of representation 
of the observations are proportionnal to the COS²)
- persp.pdf               : the estimation without the "weights"
- perspW.pdf            : the estimation with the "weights"

My questions are : As anyone done that ? Do i have right to do that ?

Thanks a lot.


See my kde2dw function below :

kde2dw <- function (x, y, h, n = 25, lims = c(range(x), range(y)), w) 
{
# w is a weight vector, you must have sum(w)=length(x)

    nx <- length(x)
    if (length(y) != nx) 
        stop("Data vectors must be the same length")

    gx <- seq(lims[1], lims[2], length = n)
    gy <- seq(lims[3], lims[4], length = n)
    if (missing(h)) h <- c(bandwidth.nrd(x), bandwidth.nrd(y))
    if(missing(w))  w <- rep(1,nx)

    h <- h/4
    ax <- outer(gx, x, "-")/h[1] 
    ay <- outer(gy, y, "-")/h[2] 


    z <- matrix(dnorm(ax), n, nx) %*% diag(w) %*% t(matrix(dnorm(ay), n, nx)) / (nx * h[1] * h[2])

    return(list(x = gx, y = gy, z = z))
}

-- 
Romain FRANCOIS : francoisromain at free.fr
page web : http://addictedtor.free.fr/  (en construction)
06 18 39 14 69 / 01 46 80 65 60
_______________________________________________________
Etudiant en 3eme année
Institut de Statistique de l'Université de Paris (ISUP)
Filière Industrie et Services
http://www.isup.cicrp.jussieu.fr/




More information about the R-help mailing list