[R] OT: irises

Thomas Lumley tlumley at u.washington.edu
Mon Oct 13 01:02:28 CEST 2008

Attention conservation notice: a digression on Fisher's iris data, related 
only tangentially to R.

The package announcement for hwriter points to a webpage created with the 
package, http://www.ebi.ac.uk/~gpau/hwriter/
based on the Fisher/Anderson iris data, including pictures.

Unfortunately, the pictures are not of the right species (two appear to be 
tall bearded iris cultivars, the third probably either Iris ensata or Iris 
siberica).  Pictures of the right species would be very useful -- Iris 
setosa really is visibly different in structure (not just in color), not 
having visible upright `standards'.  There are nice pictures at the Iris 
Species Database: http://www.badbear.com/signa/signa.pl?Introduction

Looking for pictures I noticed that the terminology seems to have changed 
since Anderson's time: most online references that distinguish between 
petals and sepals for the iris will describe the standards as petals and 
the falls (hanging-down bits) as sepals, so that I. setosa has very short 
petals, not sepals. (eg the US Forest Service at 

The other historical anomaly is that many descriptions of the data 
are as if Fisher was interested in whether I. versicolor and I.virginica 
can be separated by linear discrimination. In fact, the hypothesis was 
that I. versicolor was between the other two species and twice as close to 
I. virginica as I. setosa.  Iris virginica has twice as many chromosomes 
as I. setosa, and I. versicolor has as many as both of them put together, 
so the theory was that I. versicolor would have 4 virginica and 2 setosa 
alleles at each locus. [RA Fisher digital 
archive at University of Adelaide, http://hdl.handle.net/2440/15227]. 
This is a nice example of a null hypothesis value that is not zero.


Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

More information about the R-help mailing list