[R] tapply histogram

Fri Jun 1 17:28:47 CEST 2007

On Fri, 2007-06-01 at 06:00 -0700, livia wrote:
> Dear members,
> 
> I would like to pass the histogram settings to each subset of the dataframe,
> and generate a multiple figures graph.
> 
> First, can anyone tell me how to generate a multiple figures environment? I
> am trying 
> 
> mfrow=c(2,4) and nothing appears.
> 
> Secondly, I want to pass the following function in tapply()
> 
> hist(x, freq=FALSE)
> lines(density(x), col="red")
> rug(x)
> 
> how can I manage it?
> 
> Many thanks

In this case, you would not want to use one of the *apply() family of
functions. First, it does not save you anything and second, these
functions are designed to return some type of R object, which you don't
want here.

Better to use a for() loop and if you wish, encapsulate the loop in a
function. Something along the lines of the following, which actually
defines a new 'formula' method for hist() (though not fully tested):

hist.formula <- function(formula, data, cols, rows, ...)
{
  DF <- model.frame(formula, data = data, ...)
  DF.split <- split(DF[[1]], DF[[2]])

  par(mfrow = c(cols, rows))

  for (i in names(DF.split))
  {
    Col <- DF.split[[i]]
    hist(Col, freq = FALSE, main = i, ...)
    lines(density(Col), col = "red")
    rug(Col)
  }
}

The function will take the formula, create a data frame comprised of the
formula terms and then loop over the list of data frames created by
split(). 

So we call it as follows:

  hist(Sepal.Length ~ Species, data = iris, 2, 2)

Based upon the formula specification, you will then get a matrix of
histograms, where each will be titled with the factor level used to
split the original data frame.

You could further consolidate the function by implementing an automated
means to determine the number of rows and columns required in the plot
matrix, but I'll leave that for you.

See ?model.frame and ?split

HTH,

Marc Schwartz