[R] Getting a cdf equal to 1 from a variable kernel density estimation

Daniel Nordlund djnordlund at frontier.com
Thu May 28 01:10:18 CEST 2015


On 5/27/2015 2:43 AM, Lionel Delmas wrote:
> When I integrate the variable kernel density estimation of a sample from
> -inf to inf, I should get 1. But it is not what I get with my code below. I
> find a value higher than 2. How can I fix this?
>
> n<-1000
> df <- data.frame(x=unlist(lapply(1, function(i) rnorm(n, 0,sd=1))))
> df<-as.data.frame(df[order(df$x),])
> names(df)[1]<-"x"
>
> library(functional)
>
> gaussianKernel <- function(u, h) exp(-sum(u^2)/(2*h^2))
>
> densityFunction <- function(x, df, ker, h){
>      difference = t(t(df) - x)
>      W = sum(apply(difference, 1, ker, h=h))
>      W/(nrow(df)*(h^(length(df))))}
>
> myDensityFunction <- Curry(densityFunction, df=df, ker=gaussianKernel , h=2)
>
> vect<-vector()for (i in 1:length(df$x)){
> f<-myDensityFunction(df$x[i])
> vect<-c(vect,f)}
>
> plot(df$x,vect,ylim=c(0,1),xlim=c(-5,5),type="l")
>
> f <- approxfun(df$x, vect, yleft = 0, yright = 0)
> integrate(f, -Inf, Inf)
>
> Thanks
>


I ran your code.  Looking at the plot output, it is obvious that there 
is something wrong with you density estimates.  Since you are estimating 
density for standard normal random variates, the density for values near 
0 should be approximately 0.4, and for values in the tails the density 
should be "close" to 0.

You mention "variable kernel density estimation", but I don't see where 
you are varying your smoothing parameter, h, or any distance measure.  R 
provides a density function that could be used here, unless you are just 
wanting an excerise in how density estimation works (or this is a 
homework problem).

This is not my area of expertise, but the main problem(s) appears to be 
how gaussianKernel() and densityFunction() are written.  I think you 
want something like the following:

gaussianKernel <- function(u) exp(-u^2/2)/(2*pi)^.5

densityFunction <- function(x, df, ker, h){
     difference = t(t(df) - x)/h
     W = sum(apply(difference, 1, ker)) / (nrow(df)*h)
     }


If you are wanting to do density estimation for real world work, I would 
get help from someone in your local area.


Hope this is helpful,

Dan

-- 
Daniel Nordlund
Bothell, WA USA



More information about the R-help mailing list