[R] Normalizing data

jim holtman jholtman at gmail.com
Thu Nov 15 23:05:06 CET 2007


I am not sure what you mean when you say it does not integrate to 1.
Here are a couple of cases, and it seems fine to me:

> x <- density(1:30)
> str(x)
List of 7
 $ x        : num [1:512] -11.0 -10.9 -10.8 -10.7 -10.6 ...
 $ y        : num [1:512] 6.66e-05 7.22e-05 7.84e-05 8.49e-05 9.20e-05 ...
 $ bw       : num 4.01
 $ n        : int 30
 $ call     : language density.default(x = 1:30)
 $ data.name: chr "1:30"
 $ has.na   : logi FALSE
 - attr(*, "class")= chr "density"
> plot(x)
> sum(diff(x$x) * (head(x$y,-1) + tail(x$y,-1))/2)  # integrate
[1] 1.000823
> x <- density(rnorm(1000))
> sum(diff(x$x) * (head(x$y,-1) + tail(x$y,-1))/2) # integrate
[1] 1.000974
>


On Nov 15, 2007 2:47 PM, Joren Heit <jorenheit at gmail.com> wrote:
> Hello,
>
> I have a data set of about 300.000 measurements made by an STM which should
> apporximately fix a normal (Gaussian) distribution.
> I have imported the data in R and used plot(density()) to get a nice plot of
> the distribution which in fact looks like a real Gaussian.
> However, the integral over the surface is not equal to one (I know since
> some of the plots extend to numbers greater then 1). Is there a way to
> normalize the data so the density function will actualy yield the
> probability of x (a height in my case)?
> This is my code so far:
>
> #Input path
> path <- "G:\\C\\Data txt\\1au300.txt"
>
> #Dataverwerking
> data <- read.table(path, header=TRUE)
> rows <- length(data$height)
> height <- data$height[1:rows]
> dens <-density(height)
>
> mean <- mean(height)
> sd <- sd(height)
> min <- min(hnorm)
> max <- max(hnorm)
>
> #Plot
> par(new=FALSE)
> curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="", ylab="", col="white",
> lwd=2)
> points(dens, type="h", col="grey" )
> par(new=TRUE)
> curve(dnorm(x,m=mean,sd=sd),from=min,to=max, xlab="Height (nm)",
> ylab="Density", lwd=2, col="darkred")
>
>
> Thanks
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list