[R] Getting a cdf equal to 1 from a variable kernel density estimation
Daniel Nordlund
djnordlund at frontier.com
Thu May 28 01:10:18 CEST 2015
On 5/27/2015 2:43 AM, Lionel Delmas wrote:
> When I integrate the variable kernel density estimation of a sample from
> -inf to inf, I should get 1. But it is not what I get with my code below. I
> find a value higher than 2. How can I fix this?
>
> n<-1000
> df <- data.frame(x=unlist(lapply(1, function(i) rnorm(n, 0,sd=1))))
> df<-as.data.frame(df[order(df$x),])
> names(df)[1]<-"x"
>
> library(functional)
>
> gaussianKernel <- function(u, h) exp(-sum(u^2)/(2*h^2))
>
> densityFunction <- function(x, df, ker, h){
> difference = t(t(df) - x)
> W = sum(apply(difference, 1, ker, h=h))
> W/(nrow(df)*(h^(length(df))))}
>
> myDensityFunction <- Curry(densityFunction, df=df, ker=gaussianKernel , h=2)
>
> vect<-vector()for (i in 1:length(df$x)){
> f<-myDensityFunction(df$x[i])
> vect<-c(vect,f)}
>
> plot(df$x,vect,ylim=c(0,1),xlim=c(-5,5),type="l")
>
> f <- approxfun(df$x, vect, yleft = 0, yright = 0)
> integrate(f, -Inf, Inf)
>
> Thanks
>
I ran your code. Looking at the plot output, it is obvious that there
is something wrong with you density estimates. Since you are estimating
density for standard normal random variates, the density for values near
0 should be approximately 0.4, and for values in the tails the density
should be "close" to 0.
You mention "variable kernel density estimation", but I don't see where
you are varying your smoothing parameter, h, or any distance measure. R
provides a density function that could be used here, unless you are just
wanting an excerise in how density estimation works (or this is a
homework problem).
This is not my area of expertise, but the main problem(s) appears to be
how gaussianKernel() and densityFunction() are written. I think you
want something like the following:
gaussianKernel <- function(u) exp(-u^2/2)/(2*pi)^.5
densityFunction <- function(x, df, ker, h){
difference = t(t(df) - x)/h
W = sum(apply(difference, 1, ker)) / (nrow(df)*h)
}
If you are wanting to do density estimation for real world work, I would
get help from someone in your local area.
Hope this is helpful,
Dan
--
Daniel Nordlund
Bothell, WA USA
More information about the R-help
mailing list