[R] Histograms, density, and relative frequencies

Wed Jul 7 20:36:05 CEST 2004

On Wed, 2004-07-07 at 18:29, Bret Collier wrote:
> R-users,
>          I have been using R for about 1 year, and I have run across a 
> couple of graphics problem that I am not quite sure how to address.  I have 
> read up on the email threads regarding the differences between density and 
> relative frequencies (count/sum(count) on the R list, and I am hoping that 
> someone could provide me with some advice/comments concerning my 
> approach.  I will admit that some of the underlying mathematics of the 
> density discussion are beyond my current understanding, but I am looking 
> into it.
> 
> I have a data set (600,000 obs) used to parameterize a probabilistic causal 
> model where each obs is a population response for one of 2 classes (either 
> regs1 and regs2).  I have been attempting to create 1 marginal probability 
> plot with 2 lines (one for each class).  Using my rather rough code, I 
> created a plot that seems to adhere to the commonly used (although from 
> what I can understand wrong) relative frequency histogram approach.
> 
> My rough code looks like this:
> 
> bk <- c(0, .05, .1, .15, .2, .25,.3, .35, 1)
> par(mfrow=c(1, 1))
> fawn1 <- hist(MFAWNRESID[regs1], plot=F, breaks=bk)
> fawn2 <- hist(MFAWNRESID[regs2], plot=F, breaks=bk)
> count1 <- fawn1$counts/sum(fawn1$counts)
> count2 <- fawn2$counts/sum(fawn2$counts)
> b <- c(0, .05, .1, .15, .2, .25, .3, .35)
> plot(count1~b,xaxt="n", xlim=c(0, .5), ylim=c(0, .40), pch=".", bty="l")	
> lines(spline(count1~b), lty=c(1), lwd=c(2), col="black")
> lines(spline(count2~b), lty=c(2), lwd=c(2), col="black")
> axis(side=1, at=c(0, .05, .1, .15, .2,  .25, .3, .35))

Have you considered density() and plot.density() by any change ?

> Using the above, I get frequency values for regs1 that look like this 
> (which is the same as output for my probabilistic model):
>  > count1
> [1] 1.213378e-01 3.454324e-01 3.365343e-01 1.580839e-01 3.342101e-02
> [6] 4.698426e-03 4.488942e-04 4.322685e-05

I would tend to use the term proportion rather than frequency.

> First, count1 is the frequency of occurrence within range 0-0.05, but when 
> plotted is the value at b=0 and does not really represent the range?  Are 
> there any suggestions on a technique to approach this?

You can plot it in the mid-points like hist() does. fawn1$mids would
give you these values.

> Next:  Using the above code, the x-axis values end at 0.35, but the axis 
> continues (because bk ends at 1)?  While there is the chance of occurrence 
> out past .35, it is low and I want to extend the lines to about .35 and 
> clip the x-axis.  But, I have been unable to figure out how to clip  Could 
> someone point me in the correct direction?

In your plot() function, set xlim=c(0,0.35). If you mean 'clipping' as
in truncating the density, then you probably need to do re-adjust your
proportions such that they sum up to 1.