[R] randomness in stepclass (klaR) or lda (MASS) ?

Eric Elguero Eric.Elguero at mpl.ird.fr
Thu Apr 29 15:01:20 CEST 2010


Hi,

a colleague ran a stepwise discriminant analysis
twice in a row and got different results, suggesting
some "sochasticity" in the algorithms involved.
I looked at her data and found that there was a lot
of collinearity, so that I reckoned that maybe "stepclass" 
(klaR) cannot find a clear winner when trying to include a 
new variable and makes a random choice. Is that true?
another possibility is that "lda" (from MASS) computes
CV classification rates from a random subsample instead of
using all the data (?) That might be a sensible choice
with a very large sample.
I advised her to run the function several times and
see if a consensus emerges, but that doesn't seem to
be the case, and besides, I would like to know what
really is going on.

thanks

Eric Elguero
Laboratory Genetics and Evolution of Infectious Diseases, 
Team: Genetics and Adaptation of Plasmodium
UMR 2724 CNRS-IRD,
IRD Montpellier, 
911 Avenue Agropolis, BP 64501, 
34394 Montpellier Cedex 5, 
France


> f4.U.spDA <- stepclass(f.mes, f.gp4,
"lda",improvement=0.01,prior=rep(0.25,4))
 `stepwise classification', using 10-fold cross-validated correctness
rate of method lda'.
89 observations of 31 variables in 4 classes; direction: both
stop criterion: improvement less than 1%.
correctness rate: 0.58333;  in: "X2";  variables (1): X2 
correctness rate: 0.66389;  in: "X9";  variables (2): X2, X9 
correctness rate: 0.69583;  in: "X27";  variables (3): X2, X9, X27 

 hr.elapsed min.elapsed sec.elapsed 
       0.00        0.00       20.77 

> f4.U.spDA <- stepclass(f.mes, f.gp4,
"lda",improvement=0.01,prior=rep(0.25,4))
 `stepwise classification', using 10-fold cross-validated correctness
rate of method lda'.
89 observations of 31 variables in 4 classes; direction: both
stop criterion: improvement less than 1%.
correctness rate: 0.60556;  in: "X2";  variables (1): X2 
correctness rate: 0.71806;  in: "X6";  variables (2): X2, X6 

 hr.elapsed min.elapsed sec.elapsed 
       0.00        0.00       15.14



More information about the R-help mailing list