[R] logistic regression or discriminant analysis ?

Daniel Amorèse Daniel.Amorese at geos.unicaen.fr
Fri May 24 10:49:31 CEST 2002

Le 2002.05.23 19:38, Marc Feldesman a écrit :
> At 06:25 PM 5/23/2002 +0200, Daniel Amorèse wrote:
> >Hello,
> >Does logistic regression really provide better results than lda or qda ?
> >(my purpose is not classification but highlighting of discriminant
> >variables)
> You've asked multiple questions about "stepwise quadratic regression" and
> now "stepwise logistic regression".  You haven't really explained what
> you 
> want to do ("highlighting of discriminant variables" isn't very 
> informative).  WHY do you want to use stepwise anything?  Are you looking
> for the best set of explanatory variables to predict group membership? 
> Are 
> your data non-normal?  Do your groups have heterogeneous covariance 
> structures?  Do you have vastly unequal mixture proportions?
> It sounds like you are interested in the function discrim(), which is
> part 
> of S-Plus but not R.  You've complained about the lack of documentation
> for 
> qda() in Venables and Ripley (3rd edition?) and in the R documentation,
> but 
> you haven't told us what you were looking for.  Neither qda() nor lda()
> do 
> "stepwise", and  AFAIK, there is no stepwise logistic regression
> available 
> in R.  You might want multinomial logistic regression (multinom() in
> bundle 
> VR), but that isn't stepwise either.
> In short, you're not getting much help because the questions you're
> asking 
> aren't very well-formed.
Thank you for this detailed e-mail.
Well, then I explain more precisely my problem.

I have 2 groups (perhaps 3, if I subdivide a group into 2) of
data. These data are described by at least 15 parameters.

What I want to do: from these 15 variables, I want to get the
subset providing the largest distance between groups.
This is why a stepwise approach interests me. AM I WRONG ?
My first purpose is not to predict group membership because
I think other parameters (not available, hard to measure) can 
(slightly, I hope) modify the group membership. Thus, I am
not interested in establishing an accurate discrimination
rule. I just want to know, among the 15 variables, the subset
being more likely to participate to the discrimination.
What I have done: the correlation matrix tells me that many 
variables are correlated. Thus, I performed a lda using only 5 
variables (this selection is arbitrary performed among uncorrelated 
variables). The graphical output shows points clouds that are not 
circular: this result may suggest difference in covariance 
matrices, hence lda seems not to be the more suitable method for 
separating groups.
Perhaps, qda should be used ?
or logistic regression ? (this last method seems to be the more
robust, independent to data properties).
I know qda(), lda() or multinom() do not perform stepwise analysis,
but, what I hope, is that some outputs from these functions can
help in the selection of the most discriminatory variable subset.
Thanks again for your help.
D. Amorese
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list