[R] Estimate region of highest probabilty density

Ort Christoph Christoph.Ort at eawag.ch
Wed Jun 14 13:24:11 CEST 2006


Estimate region of highest probabilty density

Dear R-community

I have data consisting of x and y. To each pair (x,y) a z value (weight) is assigned. With kde2d I can estimate the densities on a regular grid and based on this make a contour plot (not considering the z-values). According to an earlier post in the list I adjusted the kde2d to kde2d.weighted (see code below) to estimate the densities of x and y considering their weights z. There's also a piece of code with artificial data.

Now my question: Does a function exist which calculates the value of the level corresponding to a certain percentage of the volume (i.e. above this level e.g. 95% of the total volume lie, the (parameter) region of highest probability density)? 

And secondly: How is it in the case of n parameters, when I don't want to plot it anymore but estimate quantiles for each parameter considering also the weight z (to each set of parameters c(v,w,x,y) a weight z is assigned)?

Many thanks in advance, I am very grateful for any hint
Chris




# MASS::kde2d copied and modified
# ===============================

library(MASS)

kde2d.weighted <- function (x, y, w, h, n = n, lims = c(range(x), range(y))) {
  nx <- length(x)
  if (length(y) != nx) 
      stop("data vectors must be the same length")
  gx <- seq(lims[1], lims[2], length = n) # gridpoints x
  gy <- seq(lims[3], lims[4], length = n) # gridpoints y
  if (missing(h)) 
    h <- c(bandwidth.nrd(x), bandwidth.nrd(y));
  if (missing(w)) 
    w <- numeric(nx)+1;
  h <- h/4
  ax <- outer(gx, x, "-")/h[1] # distance of each point to each grid point in x-direction
  ay <- outer(gy, y, "-")/h[2] # distance of each point to each grid point in y-direction
  z <- (matrix(rep(w,n), nrow=n, ncol=nx, byrow=TRUE)*matrix(dnorm(ax), n, nx)) %*% t(matrix(dnorm(ay), n, nx))/(sum(w) * h[1] * h[2]) # z is the density
  return(list(x = gx, y = gy, z = z))
}


# Generate artificial data
# ========================

x <- runif(20000,-5,5)
y <- runif(20000,-5,5)
z <- dnorm(x, mean=0, sd=1)*dnorm(y, mean=0, sd=1)

# plot data
# =========

library(Rcmdr)
scatter3d(x=x,z=y,y=z,surface=FALSE,xlab="x",ylab="z",zlab="y",bg.col="black")

temp0 <- kde2d(x=x, y=y, n = 100, lims=c(range(x),range(y))) 
contour(x=temp0$x, y=temp0$y, z=temp0$z, xlab="x", ylab="y", main="z")

temp <- kde2d.weighted(x=x, y=y, w=z, n = 100, lims=c(range(x),range(y))) 
contour(x=temp$x, y=temp$y, z=temp$z, xlab="x", ylab="y", main="z", col="red", add=TRUE)





°°°
Christoph Ort
Eawag - aquatic research
Environmental Engineering
Überlandstrasse 133
CH-8600 Dübendorf

phone: +41-44-823-5041          
fax:   +41-44-823-5389
cell:  +41-79-218-9242
 
mailto:christoph.ort at eawag.ch

http://www.eawag.ch/~ortchris/

http://www.eawag.ch 




°°°
Christoph Ort
Eawag - aquatic research
Environmental Engineering
Überlandstrasse 133
CH-8600 Dübendorf

phone: +41-44-823-5041          
fax:   +41-44-823-5389
cell:  +41-79-218-9242
 
mailto:christoph.ort at eawag.ch

http://www.eawag.ch/~ortchris/

http://www.eawag.ch



More information about the R-help mailing list