[R] logistic regression or discriminant analysis ?
Daniel.Amorese at geos.unicaen.fr
Fri May 24 10:49:31 CEST 2002
Le 2002.05.23 19:38, Marc Feldesman a écrit :
> At 06:25 PM 5/23/2002 +0200, Daniel Amorèse wrote:
> >Does logistic regression really provide better results than lda or qda ?
> >(my purpose is not classification but highlighting of discriminant
> You've asked multiple questions about "stepwise quadratic regression" and
> now "stepwise logistic regression". You haven't really explained what
> want to do ("highlighting of discriminant variables" isn't very
> informative). WHY do you want to use stepwise anything? Are you looking
> for the best set of explanatory variables to predict group membership?
> your data non-normal? Do your groups have heterogeneous covariance
> structures? Do you have vastly unequal mixture proportions?
> It sounds like you are interested in the function discrim(), which is
> of S-Plus but not R. You've complained about the lack of documentation
> qda() in Venables and Ripley (3rd edition?) and in the R documentation,
> you haven't told us what you were looking for. Neither qda() nor lda()
> "stepwise", and AFAIK, there is no stepwise logistic regression
> in R. You might want multinomial logistic regression (multinom() in
> VR), but that isn't stepwise either.
> In short, you're not getting much help because the questions you're
> aren't very well-formed.
Thank you for this detailed e-mail.
Well, then I explain more precisely my problem.
I have 2 groups (perhaps 3, if I subdivide a group into 2) of
data. These data are described by at least 15 parameters.
What I want to do: from these 15 variables, I want to get the
subset providing the largest distance between groups.
This is why a stepwise approach interests me. AM I WRONG ?
My first purpose is not to predict group membership because
I think other parameters (not available, hard to measure) can
(slightly, I hope) modify the group membership. Thus, I am
not interested in establishing an accurate discrimination
rule. I just want to know, among the 15 variables, the subset
being more likely to participate to the discrimination.
What I have done: the correlation matrix tells me that many
variables are correlated. Thus, I performed a lda using only 5
variables (this selection is arbitrary performed among uncorrelated
variables). The graphical output shows points clouds that are not
circular: this result may suggest difference in covariance
matrices, hence lda seems not to be the more suitable method for
Perhaps, qda should be used ?
or logistic regression ? (this last method seems to be the more
robust, independent to data properties).
I know qda(), lda() or multinom() do not perform stepwise analysis,
but, what I hope, is that some outputs from these functions can
help in the selection of the most discriminatory variable subset.
Thanks again for your help.
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help