# [R] father and son heights

Rolf Turner rolf at math.unb.ca
Sun Feb 15 20:30:43 CET 2004

```Ann Loraine wrote:

> I'm looking for Pearson's father and son height data.
> ...........  It's a data set that is used to teach Pearson's
> correlation coefficient in a popular statistics textbook - "Statistics"
> by Freedman, Pisani, et al.
>
> It contains over a thousand measurements of son's and their father's
> heights.
>
> I would like to find it in electronic form so that I can use it to
> prepare figures and examples for a lecture.
>
> If anyone knows where I could find it, please let me know.  I've done a
> few Google searches but haven't had any luck so far.  I also used the
> data() command to look through R's built-in data sets and couldn't find
> it.  Any suggestions would be most welcome!

I believe that you have been searching under the wrong name.  The
data are most closely associated with Galton (the bloke to whom the
word ``regression'' is due) rather than with Pearson.

A search on

Galton height

led me immediately to

http://wiener.math.csi.cuny.edu/UsingR/Data/galton.html

where the data appear to be readily available.

I ***presume*** that these are the data you seek, although there are
only 930 observations, not ``over a thousand''.  (Close, but!)

The data are given to a limited accurracy, which induces a strangely
grid-like appearance when they are plotted, but that is presumably
the nature of this data set.  They were apparently taken from a table
prepared by Galton.  Values which were originally given in Galton's
table as ``>= 73.7'' or ``<= 61.7'' are truncated to their respective
bounds.

One thing that puzzles me:  The documentation says that the data
pertain to 928 children, yet there are 930 data points. (????)
I can't find an explanation in the documentation.  Maybe I'm just
blind.  Or thick.

cheers,

Rolf Turner
rolf at math.unb.ca

```