[R] ggplot2 / histogram / y-axis

hadley wickham h.wickham at gmail.com
Fri Jul 13 08:12:46 CEST 2007

On 7/12/07, Pete Kazmier <pete-expires-20070910 at kazmier.com> wrote:
> "hadley wickham" <h.wickham at gmail.com> writes:
> > On 7/12/07, Pete Kazmier <pete-expires-20070910 at kazmier.com> wrote:
> >> Is there a way in ggplot to make a histogram with the left-hand y-axis
> >> label as frequency, and a right-hand y-axis label as percentage?
> >
> > Not currently.  I did a quick exploration to see if it was feasible to
> > draw another axis on with grid, but it doesn't look like it's
> > possible:
> Thank you for trying.
> > Also how were you expecting the axes/gridlines to line up?  Would both
> > axes be labelled "nicely" (with whole numbers) and the secondary axis
> > wouldn't have gridlines; or would the second axis match the lines of
> > the primary, even though the number wouldn't be so attractive?
> I hadn't thought that far ahead.  Depending on the audience, I render
> histograms differently, and was curious if I could just put both on a
> single graph.  However, you bring up some interesting questions in
> terms of the presentation.
> On another note, and feel free to defer me to the documentation which
> I'm still in the process of reading, but will I be able to take
> advantage of some of Tufte's recommendations in terms of the typical
> histogram and/or scatterplots (pp126-134 in Visual Display of
> Quantitative Information)?
> For example, with histograms, he would eliminates the use of
> coordinate lines in favor of using a white grid to improve the
> data/ink ratio.  Likewise in scatterplots, he uses range-frames and
> dot-dash-plots.  Will I be able to use ggplot for these types of
> enhancements?

I am familiar with Tufte's suggestions, and while they do increase the
data-ink ratio, I'm not confident they actually make the plot any
better perceptually.  Displaying grid lines on _top_ of data seems
like a bad idea, and throwing away the plot frame is a bad idea
because you loose important visual reference points.  Range frames
also fail to scale to facetted plots.

If you're not already familiar with them, I strongly recommend the
following two papers which tacke similar ideas to Tufte but in a
rigourous scientific framework:

	Author = {Cleveland, William and McGill, Robert},
	Journal = {Journal of the Royal Statistical Society. Series A (General)},
	Number = {3},
	Pages = {192-229},
	Title = {Graphical Perception: The Visual Decoding of Quantitative
Information on Graphical Displays of Data},
	Volume = {150},
	Year = {1987}}

	Author = {Cleveland, William},
	Journal = {Journal of Computational and Graphical Statistics},
	Pages = {323-364},
	Title = {A model for studying display methods of statistical graphics},
	Volume = {2},
	Year = {1993}}


More information about the R-help mailing list