[R] Histogram and Density on the the same graph

(Ted Harding) Ted.Harding at manchester.ac.uk
Mon Nov 30 14:55:22 CET 2009


> Trafim,
> If you are plotting more than one variables on the same plot
> e.g. by using the lines() or points() function, then the limits
> of the X and Y axes are set based on the first variable you plot.
> So, you would have to set the xlim and ylim to the limits of the
> variable with the widest range, otherwise you would sometimes
> see some data left out.

To follow up, and expand, Hrishi's advice above:

When you create a plot using plot(), or by a function such as
hist() which (without "add=TRUE") will use plot(), by default R
chooses the limits on the X range and the Y range according to
the values encountered in whatever is being plotted. This will
be done by a rather general rule designed to achieve a "pretty"
result. Subsequent additions to the plot, using say lines() or
points(), will be made using the same limits.

Therefore, to ensure that everything that you want, which you
will add in separate stages, will be wholly visible, you first
need to ascertain what the necessary limits will be by inspecting
all of the elements to find their global minimum and maximum.

Going back to your orotingal example (and using set.seed() for
a reproducible result):

  set.seed(12345)
  x <- seq(1,40,1)
  y <- 2*x+1+5*rnorm(length(x))

  hist(y,freq = FALSE)
  lines(density(y))

You will see that, although the maximum value of density(y) does
not go above the Y range already allocated for hist(y), it seems
that the X range does go beyond the X range which was alloocated.
So have a look at density(y):

  density(y)
  # Call:
  #        density.default(x = y)
  # Data: y (40 obs.);      Bandwidth 'bw' = 10.7
  #        x                 y            
  #  Min.   :-28.188   Min.   :1.297e-05  
  #  1st Qu.:  8.843   1st Qu.:1.261e-03  
  #  Median : 45.874   Median :8.337e-03  
  #  Mean   : 45.874   Mean   :6.744e-03  
  #  3rd Qu.: 82.905   3rd Qu.:1.104e-02  
  #  Max.   :119.937   Max.   :1.243e-02  

Therefore the full X range for density(y) needs (-30,120). So
start by setting as suitable xlim for the hist(y), and then put
in the lines() for density(y):

  hist(y,freq = FALSE,xlim=c(-30,120))
  lines(density(y))

Now you have the full plot of density(y), but now the X-axis
which is shown only ranges over (0,100). You can change this,
but will not find out how by looking at ?hist, since the secret
is hidden in the "..." which are described as "further arguments
and graphical parameters passed to 'plot.histogram' and thence
to 'title' and 'axis' (if 'plot=TRUE')."

So you need to look at ?plot.histogram which will in turn pass you
on to "...: further graphical parameters to 'title' and 'axis'."

At this point you are almost there, but need to realise that what
you should be looking at is ?axis. Here you find the paramater "at".
So try augmenting the hist() command by setting an "at":

  hist(y,freq = FALSE,xlim=c(-30,120),at=10*(-3:12))
  lines(density(y))

This produces an axis over (-30,120), but with a warning that
"at is not a grapohical parameter" (which is a bit mysterious,
since in fact it is what does the job); but the tic-marks
are not placed at every desired value (-30, 100 and 120 are
omitted) -- I confess I do not understand why!

I give the above explanation in detail to illustrate that, when
you go beyond basic use of R's graphical functions you may have
to embark on a possibly length search through a chain of links
in the documentation before you find what you are looking for,
R's graphics, despite apparent simplicity for default simple
usages, is in fact very complicated, and the documentation is
Byzantine!

In this particular case, Aysun's suggestion of first plotting
density(y) and then adding the histogram is simpler -- but now
bear in mind that the heights of the histogram bars will go
above the default limits set when density(y) is plotted-- see:

  plot(density(y))
  hist(y,freq=FALSE,add=TRUE)

So use "ylim" to specify this:

  plot(density(y),xlim=c(-30,120),ylim=c(0,0.015))
  hist(y,freq=FALSE,add=TRUE)

Hoping this helps!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Nov-09                                       Time: 13:55:18
------------------------------ XFMail ------------------------------




More information about the R-help mailing list