[R] Diamond graphs

Thu Aug 21 07:21:50 CEST 2003

I apologise for starting a new thread, but we had a mail problem and I
don't have the original message to refer to.

Someone mentioned the new "Diamond Graphs" invented at Johns Hopkins.
I haven't see the August 2003 issue of The American Statistician yet,
but I _have_ read the press release.

The press release is a bit of a stunner.  I quote:
    "Who would have thought we would still be inventing
     new methods of graphing in the twenty-first century?"

A1: anyone with a functioning brain?
A2: anyone who didn't sleep through the twentieth century?

I can summarise diamond graphs this way:
    (1) Write a 2D table.		- very old idea
    (2) Instead of numbers, put blobs of
	some kind where the size shows you
	the importance.			- at least 3000 years old.
    (3) Rotate the table widdershins 45
	degrees				- swiped from 3D displays
    (4) Replace the blobs by truncated diamonds;
        height of bar or area of polygon or something
        shows value.			- NOVELTY
    (5) Notice that it's not all that readable,
	so put the numbers back.

The fact that someone would try to patent this strikes me as outrageous;
the actual amount of novelty is so tiny.

For R, I don't think it matters, because I think that diamond graphs
are a bad idea.  Let me try to explain why.

In effect, you have a 2D bar chart, where each bar occupies a rather
small diamond-shaped cell.  The bars are sort of squashed to fit in.

ASCII graphic of a typical bar:
             |
          _______
         /       \
     __ /         \ __
        \         /
         \_______/
             |

These bars have two axes of symmetry: a vertical mirror axis and
a horizontal mirror axis.  The lines outside the hexagon above
show the symmetry axes.  I shall use horizontal and vertical coordinates
running form -1 to +1, and define the height of the bar to be the amount
that the bar extends above the horizontal: 0 <= h <= 1.  When h = 0,
no polygon is shown.  When h = 1, the polygon is a square occupying the
whole cell.  (I don't actually understand this; in the illustration in
http://www.jhu.edu/~gazette/2003/18aug03/18graph.html
there is _always_ a margin, except when the square is full.  It doesn't
really spoil my point.)

Total height of bar:		2h
Width of bar top:		2 - 2h
Total width of bar:		2
Diagonal (corner to corner):	2.sqrt(2h^2 - 2h + 1)
Area of bar:			4h - 2h^2

The information-bearing units don't just change *size*, and they don't
just *stretch* in one dimension (like normal histogram bars), they change
shape much more drastically than that.  I don't see how this can make it
easy to relate one bar to another; your impression of how *much* bigger
one bar is than another depends on which visual aspect you attend to.

As Tufte (Visual Display of Quantitative Information) puts it:
    There are considerable ambiguities in how people perceive
    a two-dimensional surface and then convert that perception
    into a one-dimensional number.
(p71)  

Combine this with the small "dynamic range" available (because you have
lots of cells to fit in), and there doesn't seem to be any advantage
over just using discs of various sizes (which is a fairly old technique;
you'll find a similar idea in ABCs of EDA).

Oh yes, you'll find tables with entries shown by amount of ink
on page 174 of Tufte's Visual Display...

I'm assuming here that the use of truncated squares ("hexagons")
is considered important.  The Gazette web page above has some other
examples showing
(A) plain diamonds that change size, not shape
(B) "diamonds" without margins, that change shape as described above
(C) "diamonds" with margins, that change shape as described above
(D) diamonds with fixed width spanning the cell, where the height
    changes
(E) something with a rectangle, a cell, and two "bow ties", each in a cell.
    I have no idea what that the different shapes mean.
Since the text says "The researcher experimented with other shapes but
sound that the six-sided polygon was the only shape to represent the
outcomes equally within the grid as it expanded", I surmise that A, B,
D, and E are meant to be understood as "bad examples" that the diamonds
of C improve on.

It is not clear to me what the advantage of turning the diagram 45
degrees widdershins is supposed to be.  I'm assuming here (and I don't
even play an expert on TV) that vertical patterns are easier to grasp
than diagonal ones.  Now, the main example on that web page (and in the
PDF file you can get to from the URI posted in the original message)
can be summarised as
    Systolic >= 180	BIG
    Systolic 160..179	moderate
    Everything else	pretty much small
which is quite easy to see if "systolic" is the horizontal or vertical
axis, but when the vertical axis is "systolic + diastolic" and the
horizontal is "diastolic - systolic" (a bit of hand-waving here, because
the buckets aren't the same width, so + and - are a bit dodgy), it gets
rather harder to see.

Turn to the examples on page 174 of Tufte again, where the number of
values for one variable (6) is not the same as the number for the other (16).
Would that look good if you turned it 45 degrees?  The diamond graph appears
to rely on the two explanatory variables having nearly the same number of
values, which would seem to limit its usefulness.

What would happen if we turn the diagram back so that the axes are
horizontal and vertical?  Well, with square (or rectangular) cells
we could put _several_ vertical bars in each cell, and so display
2 or 3 variables on the same 2d grid, something which would be very
hard to do in a diamond graph.

In short, it looks to me as though "diamond graphs" are something R
is better off without.