[R] diamond graphs

Tue Aug 26 04:09:45 CEST 2003

Scott Zeger <szeger at jhsph.edu> commented:
	First, "diamond graphs" were developed as part of the Multi-center Aids
	Cohort Study, a seminal study of HIV infection in the U.S. in which these
	authors have been key co-investigators. The graphs were created to better
	address a real scientific objective and that usually bodes well for their
	longer-term value.

I've invented a couple of graphic techniques myself.  They were devised
to deal with problems of actual practical interest at the time.  "That
usually bodes well for their longer-term value"?  No, I am these days
glad that I never published them, because R is chock full of *better*
methods than mine.

As yet I have not had a chance to see the actual article.  (Living in
the Southern Hemisphere has advantages, but also disadvantages, like
the time it takes periodicals to arrive.)  The one example of a diamond
graph I've seen did make a certain pattern in the data easy to spot, but
it made it harder to spot than other graphs would have.  Amongst other
things, it would be very interesting to see some sort of 2d density
plot with log(diastolic) and log(systolic) as axes.  Perhaps this was
already done in the article.

First Lispstat and now R have impressed on my mind the importance of
moving beyond paper.  The possibility of displaying the same data in
_several_ ways, simultaneously or in quick succession, means that
computer graphics can be a qualitatively different medium from paper.

Just this afternoon I was talking with a 4th-year CS student who is
working on a project to try to find features which will enable him to
find patterns in a certain kind of data.  Using R, I generated some
synthetic data in a couple of lines of code.  Then I plotted it several
different ways, scratched my head a bit, rummaged through a list of
smoothing functions found using help.search, and tried something, plotted
it, changed a scale factor, tried again, settled on a scale factor that
seemed to work well, switched back to thinking about calculations, and
in about 15 minutes, there was a technique for finding interesting change
points in the data.  I confused him a bit because I was switching plots
faster than he could follow, so I spent the next 45 minutes explaining
what I'd done.  The point was that *changing* plots was qualitatively
different from looking at a single plot.

Now, the data displayed in the one example in the press release seemed
to be (diastolic pressure bucket) x (systolic pressure bucket) -> count.
As noted above, that suggests a 2d density estimate as an interesting
thing.  It also suggests a scatter plot (possibly with rugs).  Most
importantly, it suggests BOTH of them, and several others as well (such
as hexbin), each of which may provide some insight that the others don't.

It's very VERY hard for any one graph, especially one with a cramped
dynamic range, to beat that.  The real competition for the diamond graph
is not some other graph, but a wide choice of graphs that can be quickly
flicked through and creatively combined.

This also means that a new graphic technique, if it _is_ good, is even
_better_ when it can be freely creatively combined with other graphic
techniques.  Having diamond graphs locked out of R is bad *for* diamond
graphs.

	In fact, the Johns Hopkins Department of Biostatistics faculty and
	graduates are active participants in and enthusiastic supporters of open
	source software development. For recent examples, see:
	http://www.biostat.jhsph.edu/biostat/research/software.shtml

Not only that, at least one of them, R/qtl, is an R package.