[R] curve of density on histogram

Duncan Murdoch murdoch at stats.uwo.ca
Thu Mar 8 20:17:40 CET 2007


On 3/8/2007 11:29 AM, (Ted Harding) wrote:
> On 08-Mar-07 KOITA Lassana - STAC/ACE wrote:
>> 
>> Hi R users,
>> I would like to know why these following curve densities don't appear
>> correctly on the histograms.
>> Thank you for your help
>> 
>> 
>> 
>> library(lattice)
>> library(grid)
>> resp  <- rnorm(2000)
>> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
>> histogram(~ resp | group, col="steelblue",
>>   panel = function(x, ...){
>>     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else
>> "NA"
>>     n <- length(x)
>>     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else
>> "NA"
>>     panel.histogram(x, ...)
>>     panel.mathdensity(dmath = dnorm, col = "green",
>>                                 args = list(mean=mean(x),sd=sd(x)))
>>     panel.abline(v= mean(x), col = "red")
>>     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
>> col='yellow' )
>> 
>>     x1 <- unit(1, "npc") - unit(2, "mm")
>>     y1 <- unit(1, "npc") - unit(2, "mm")
>>     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just =
>> "right")
>>     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
>> "lines"), just = "right")
>>     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
>> unit(2, "lines"), just = "right")
>> 
>> })
> 
> The following is an approximation to what you want to do.
> However, it needs one thing to be determined, which I have
> not managed to work out how to do.
> 
> The reason for your bad density plots is that the density
> function dnorm needs to be scaled (by the number of values
> whose histogram is being lotted) and by the width of the
> intervals.
> 
> Hence I first define a function sdnorm() (for "scaled dnorm")
> and then change one line in your code, as follows:
> 
> library(lattice)
> library(grid)
> resp  <- rnorm(2000)
> group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
> #### New function sdnorm:
> sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
> histogram(~ resp | group, col="steelblue",
>   panel = function(x, ...){
>     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else "NA"
>     n <- length(x)
>     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else "NA"
>     panel.histogram(x, ...)
> #### Changed line:
>     panel.mathdensity(dmath = sdnorm, col = "green",
>           args = list(mean=mean(x),sd=sd(x),N=length(x),binwid=0.10))
>     panel.abline(v= mean(x), col = "red")
>     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
> col='yellow' )
> 
>     x1 <- unit(1, "npc") - unit(2, "mm")
>     y1 <- unit(1, "npc") - unit(2, "mm")
>     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
>     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
> "lines"), just = "right")
>     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
> unit(2, "lines"), just = "right")
> })
> 
> The argument N to sdnorm is readily available from the argument x,
> as N = length(x).
> 
> However, I cannot work out from the documentation for these panel
> functions how to determine the width of the histogram bins, which
> is argument binwid to sdnorm(). Hence I have simply set binwid=0.l0
> to illustrate the point, since this gives an approximately correct
> plot. But it will only be really correct when binwid can somehow
> be determined from the hosyogram being plotted, and it is this
> which I cannot see!

The breaks are one of the ... args passed to the panel function, so you 
can get the binwidth from there.  But there's another problem:  the 
panel.histogram function gives percent of total, so should integrate to 
100, not to N.  I think this version gives what is wanted:

library(lattice)
library(grid)
resp  <- rnorm(2000)
group <- sample(c("G1", "G2", "G3", "G4"), replace = TRUE, size = 1000)
#### New function sdnorm:
sdnorm <-function(x,mean=0,sd=1,N=1,binwid=1){N*binwid*dnorm(x,mean,sd)}
histogram(~ resp | group, col="steelblue",
   panel = function(x, ...){
     std <- if(length(x) > 0) format(round(sd(x), 2), nsmall = 2) else "NA"
     n <- length(x)
     m <- if(length(x) > 0) format(round(mean(x), 2), nsmall = 2) else "NA"
     panel.histogram(x, ...)


     breaks <- list(...)$breaks
     binwid <- breaks[2]-breaks[1]
     panel.mathdensity(dmath = sdnorm, col = "green",
           args = 
list(mean=mean(x),sd=sd(x),N=100,binwid=breaks[2]-breaks[1]))


     panel.abline(v= mean(x), col = "red")
     panel.abline(h=5)
     panel.xyplot(x = jitter(x),type="p",pch=20,y = rep(0, length(x)),
col='yellow' )

     x1 <- unit(1, "npc") - unit(2, "mm")
     y1 <- unit(1, "npc") - unit(2, "mm")
     grid.text(label = bquote(n == .(n)), x = x1, y = y1, just = "right")
     grid.text(label = bquote(hat(m) == .(m)), x = x1, y = y1 - unit(1,
"lines"), just = "right")
     grid.text(label = bquote(hat(s) == .(std)), x = x1, y = y1 -
unit(2, "lines"), just = "right")
})

Duncan Murdoch



More information about the R-help mailing list