[R] Exploratory multivariate analysis of categorical data

Thu Jan 11 20:53:49 CET 2007

This is my first post to R-help. I am doing some research into the text
of the New Testament, specifically places where textual variation occurs
across manuscripts. (See http://purl.org/tfinney/NTText/book/index.html
for details.)

New Testament textual critics call places where the text varies
"variation units," and each state of the text in a variation unit is
called a "reading." The apparatus of a critical edition can be
transformed into a data matrix by making each witness (typically a
manuscript, but might be an early version or church father) an
observation (i.e. a row) and each variation unit a variable (i.e. a
column). I encode readings, which consist of words or phrases, as
numerals in the data matrix. (There are often more than two readings in
a variation unit.) I make a dissimilarity matrix by calculating the
proportion of variation units in which each pair of witnesses disagrees.

Here is my question: Which exploratory multivariate techniques are
applicable to this kind of data matrix and this kind of dissimilarity
matrix? From reading the R docs, it seems to me that MDS (metric and
non-metric) and hierarchical clustering are appropriate, but I am not so
sure about others.

Best

Tim Finney