[R] log y 'axis' of histogram

Hadley Wickham hadley at rice.edu
Mon Aug 30 20:33:45 CEST 2010


>> That doesn't justify the use of a _histogram_  - and regardless of
>
> The usage highlights meaningful characteristics of the data.
> What better justification for any method of analysis and display is
> there?

That you're displaying something that is mathematically well founded
and meaningful - but my emphasis there was on histogram.  I don't
think a histogram makes sense, but there are other ways of displaying
the same data that would (e.g. a frequency polygon, or maybe a density
plot)

>> what distributional display you use, logging the counts imposes some
>> pretty heavy restrictions on the shape of the distribution (e.g. that
>> it must not drop to zero).
>
> Does there have to be a recognized statistical distribution to use R?

My point is about the display - if your binned counts look like 1,
100, 1000, 100, 0, 0, 10, 1000, 1000, how do you display the log
counts?

> In my case I am using R for all of the analysis and graphics in a
> new book.  This means that sometimes I have to deal with data sets
> that are more or less a jumble of numbers with patterns in a few
> places.  For instance, the numeric value of integer constants
> appearing as one operand of the binary bitwise-AND operator (see
> figure 1224.1 of www.knosof.co.uk/cbook/usefigtab.pdf, raw data
> at: www.knosof.co.uk/cbook/bandcons.hist.gz)
>
> qplot(band, binwidth=8, geom="histogram") + scale_y_log()
> does a good job of highlighting the peaks.

I couldn't find that figure, but I'd think geom = "freqpoly" would be
more appropriate.  (I'd also suggest adding a bit more space between
the data and the margins in your figures - they overlap in many
plots).

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/



More information about the R-help mailing list