[R] Animal Morphology: Deriving Classification Equation with

Sun May 24 22:32:06 CEST 2009

Dear Ted,

Thank you for taking the time out to help me with this analysis. I'm seeing
that I may have left out a crucial detail concerning this analysis. The ID
measurement (interpubic distance) is a new measurement that has never been
used in the field of ornithology (to my knowledge). The objective of the
paper is to demonstrate the usefulness of ID. The paper compared ID with
plumage criterion, a categorical variable at best, but under peer-review
there is a request to use other morphological data to compare/contrast ID.
Unfortunately, wing (WG) and weight (WT) were the only measurements taken in
addition to ID in this study.

The purpose of the LDA is to demonstrate the power if ID in the context of
WG and WT. I agree that WG is a terrible metric for discrimination, WT is
good but there is significant overlap between groups, but ID is a good
discriminator on it's own (classified 97-100% of all individuals based on
92.5% CI).

You pointed out that I am violating assumptions with LDA based on different
covariances between sexes (thank you... I never would have caught it). I'm
wondering how to proceed.

Should I:

1) Perform linear discrimination with WT and ID, and then determine a
classification equation? And, if I do how do I derive the classification
equation (e.g. [Cj = cj0+ cjWTxWT+ cjIDxID; Cj>x= male, Cj<x=female])

2) Demonstrate that ID is important based on linear discrimanant
coefficients and structure coefficients from this WG, WT, and ID LDA;
discuss the assumption violation and argue for it's use as a demonstration
of variable predicting power; and NOT provide a classification equation
because we already have ID ranges and it would be inappropriate.

3) Both #1 and #2 because WT and ID provide such a good discriminating
function and use the WG, WT, and ID LDA for demonstration of variable
prediction value.

4) ??? better suggestions.

THANK YOU so much for responding and all of your insight. I'm humbled by
your R skills... that code nearly too me all day to write (little by little
I'm learning).

Chase

Ted.Harding-2 wrote:
> 
> [Your data and output listings removed. For comments, see at end]
> 
> On 24-May-09 13:01:26, cdm wrote:
>> Fellow R Users:
>> I'm not extremely familiar with lda or R programming, but a recent
>> editorial review of a manuscript submission has prompted a crash
>> course. I am on this forum hoping I could solicit some much needed
>> advice for deriving a classification equation.
>> 
>> I have used three basic measurements in lda to predict two groups:
>> male and female. I have a working model, low Wilk's lambda, graphs,
>> coefficients, eigenvalues, etc. (see below). I adjusted the sample
>> analysis for Fisher's or Anderson's Iris data provided in the MASS
>> library for my own data.
>> 
>> My final and last step is simply form the classification equation.
>> The classification equation is simply using standardized coefficients
>> to classify each group- in this case male or female. A more thorough
>> explanation is provided:
>> 
>> "For cases with an equal sample size for each group the classification
>> function coefficient (Cj) is expressed by the following equation:
>> 
>> Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp
>> 
>> where Cj is the score for the jth group, j = 1 â€¦ k, cjo is the
>> constant for the jth group, and x = raw scores of each predictor.
>> If W = within-group variance-covariance matrix, and M = column matrix
>> of means for group j, then the constant   cjo= (-1/2)CjMj" (Julia
>> Barfield, John Poulsen, and Aaron French 
>> http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm).
>> 
>> I am unable to navigate this last step based on the R output I have.
>> I only have the linear discriminant coefficients for each predictor
>> that would be needed to complete this equation.
>> 
>> Please, if anybody is familiar or able to to help please let me know.
>> There is a spot in the acknowledgments for you.
>> 
>> All the best,
>> Chase Mendenhall
> 
> The first thing I did was to plot your data. This indicates in the
> first place that a perfect discrimination can be obtained on the
> basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated
> to WG, WT, ID, SEX):
> 
>   d.csv("horsesLDA.csv")
>   # names(D0) # "WRMA_WG"  "WRMA_WT"  "WRMA_ID"  "WRMA_SEX"
>   WG<-D0$WRMA_WG; WT<-D0$WRMA_WT;
>   ID<-D0$WRMA_ID; SEX<-D0$WRMA_SEX
> 
>   ix.M<-(SEX=="M"); ix.F<-(SEX=="F")
> 
>   ## Plot WT vs ID (M & F)
>   plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
>   points(ID[ix.M],WT[ix.M],pch="+",col="blue")
>   points(ID[ix.F],WT[ix.F],pch="+",col="red")
>   lines(ID,15.5-1.0*(ID))
> 
> and that there is a lot of possible variation in the discriminating
> line WT = 15.5-1.0*(ID)
> 
> Also, it is apparent that the covariance between WT and ID for Females
> is different from the covariance between WT and ID for Males. Hence
> the assumption (of common covariance matrix in the two groups) for
> standard LDA (which you have been applying) does not hold.
> 
> Given that the sexes can be perfectly discriminated within the data
> on the basis of the linear discriminator (WT + ID) (and others),
> the variable WG is in effect a close approximation to noise.
> 
> However, to the extent that there was a common covariance matrix
> to the two groups (in all three variables WG, WT, ID), and this
> was well estimated from the data, then inclusion of the third
> variable WG could yield a slightly improved discriminator in that
> the probability of misclassification (a rare event for such data)
> could be minimised. But it would not make much difference!
> 
> However, since that assumption does not hold, this analysis would
> not be valid.
> 
> If you plot WT vs WG, a common covariance is more plausible; but
> there is considerable overlap for these two variables:
> 
>   plot(WG,WT)
>   points(WG[ix.M],WT[ix.M],pch="+",col="blue")
>   points(WG[ix.F],WT[ix.F],pch="+",col="red")
> 
> If you plot WG vs ID, there is perhaps not much overlap, but a
> considerable difference in covariance between the two groups:
> 
>   plot(ID,WG)
>   points(ID[ix.M],WG[ix.M],pch="+",col="blue")
>   points(ID[ix.F],WG[ix.F],pch="+",col="red")
> 
> This looks better on a log scale, however:
> 
>   lWG <- log(WG) ; lWT <- log(WT) ; lID <- log(ID)
> ## Plot log(WG) vs log(ID) (M & F)
>   plot(lID,lWG)
>   points(lID[ix.M],lWG[ix.M],pch="+",col="blue")
>   points(lID[ix.F],lWG[ix.F],pch="+",col="red")
> 
> and common covaroance still looks good for WG vs WT:
> 
>   ## Plot log(WT) vs log(WG) (M & F)
>   plot(lWG,lWT)
>   points(lWG[ix.M],lWT[ix.M],pch="+",col="blue")
>   points(lWG[ix.F],lWT[ix.F],pch="+",col="red")
> 
> but there is no improvement for WG vs IG:
> 
>   ## Plot log(WT) vs log(ID) (M & F)
>   plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
>   points(ID[ix.M],WT[ix.M],pch="+",col="blue")
>   points(ID[ix.F],WT[ix.F],pch="+",col="red")
> 
> So there is no simple road to applying a routine LDA to your data.
> 
> To take account of different covariances between the two groups,
> you would normally be looking at a quadratic discriminator. However,
> as indicated above, the fact that a linear discriminator using
> the variables ID & WT alone works so well would leave considerable
> imprecision in conclusions to be drawn from its results.
> 
> Sorry this is not the straightforward answer you were hoping for
> (which I confess I have not sought); it is simply a reaction to
> what your data say.
> 
> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 24-May-09                                       Time: 20:07:43
> ------------------------------ XFMail ------------------------------
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: http://www.nabble.com/Animal-Morphology%3A-Deriving-Classification-Equation-with-Linear-Discriminat-Analysis-%28lda%29-tp23693355p23697743.html
Sent from the R help mailing list archive at Nabble.com.