[R] log y 'axis' of histogram

Derek M Jones derek at knosof.co.uk
Mon Aug 30 21:56:55 CEST 2010


Hadley,

> That you're displaying something that is mathematically well founded
> and meaningful - but my emphasis there was on histogram.  I don't
> think a histogram makes sense, but there are other ways of displaying
> the same data that would (e.g. a frequency polygon, or maybe a density
> plot)

The problem I have with geom = "freqpoly" is that it is not immediately
obvious to the casual reader of the figure that binned data has been
plotted.  The horizontal line at the top of each bar does make that
obvious.  Lots of solid black is an eye sore and using something
like fill="white" helps to solve this problem (although this
currently appears red for me, probably some configuration issue to
sort out).

I'm not sure that a histogram using variable width bins and one log
scale has any meaningful interpretation; having both axis use a log
scale might make sense with variable width bins.

>>> what distributional display you use, logging the counts imposes some
>>> pretty heavy restrictions on the shape of the distribution (e.g. that
>>> it must not drop to zero).
>>
>> Does there have to be a recognized statistical distribution to use R?
>
> My point is about the display - if your binned counts look like 1,
> 100, 1000, 100, 0, 0, 10, 1000, 1000, how do you display the log
> counts?

Many functions cannot handle log(0) so the safest thing to do is
remove 0s.  What about 1 and other values more than X orders of
magnitude less than the maximum?  This is an issue on any log scaled
plot and invariably they don't appear (and neither do the log(0)
cases).

Having a scale that gets closer to zero without ever getting there
is something that has to be accepted when displaying a log scale.

Logarithms are familiar to a technical readership and using them for
data spanning several orders of magnitude can highlight meaningful
relationships.  A non-technical readership is likely to completely
misunderstand a log scale and I have no idea how to display this
kind of data to such people.

> I couldn't find that figure, but I'd think geom = "freqpoly" would be
> more appropriate.  (I'd also suggest adding a bit more space between
> the data and the margins in your figures - they overlap in many
> plots).

My mistake, I as looking at a very old printed copy.  See figure 1234.1
These figures are from a previous book
www.knosof.co.uk/cbook
which used grap to draw all the graphs
www.lunabase.org/~faber/Vault/software/grap/
with the numbers being extracted and processed by various C programs and
awk scripts.

-- 
Derek M. Jones                         tel: +44 (0) 1252 520 667
Knowledge Software Ltd                 mailto:derek at knosof.co.uk
Source code analysis                   http://www.knosof.co.uk



More information about the R-help mailing list