[R] randomness in stepclass (klaR) or lda (MASS) ?

Uwe Ligges ligges at statistik.tu-dortmund.de
Thu Apr 29 15:08:02 CEST 2010



On 29.04.2010 15:01, Eric Elguero wrote:
> Hi,
>
> a colleague ran a stepwise discriminant analysis
> twice in a row and got different results, suggesting
> some "sochasticity" in the algorithms involved.
> I looked at her data and found that there was a lot
> of collinearity, so that I reckoned that maybe "stepclass"
> (klaR) cannot find a clear winner when trying to include a
> new variable and makes a random choice. Is that true?

Yes, since a cross validation is involved.
If you want stable results, you could try leave one out or set a seed.
Anyway, if you variables are collinear I wonder if the stepwise approach 
is the smartest solution here.....



> another possibility is that "lda" (from MASS) computes
> CV classification rates from a random subsample instead of
> using all the data (?) That might be a sensible choice
> with a very large sample.
> I advised her to run the function several times and
> see if a consensus emerges, but that doesn't seem to
> be the case, and besides, I would like to know what
> really is going on.

Well, it is called cross validation which is based on random sampling if 
you do not have k=n -fold CV (=leave-one-out).
Again, to get reproducible results, you will need to set a seed.


If the results are that unstable: Do you really have a sufficient number 
of observations for your classification problem?

Uwe Ligges




>
> thanks
>
> Eric Elguero
> Laboratory Genetics and Evolution of Infectious Diseases,
> Team: Genetics and Adaptation of Plasmodium
> UMR 2724 CNRS-IRD,
> IRD Montpellier,
> 911 Avenue Agropolis, BP 64501,
> 34394 Montpellier Cedex 5,
> France
>
>
>> f4.U.spDA<- stepclass(f.mes, f.gp4,
> "lda",improvement=0.01,prior=rep(0.25,4))
>   `stepwise classification', using 10-fold cross-validated correctness
> rate of method lda'.
> 89 observations of 31 variables in 4 classes; direction: both
> stop criterion: improvement less than 1%.
> correctness rate: 0.58333;  in: "X2";  variables (1): X2
> correctness rate: 0.66389;  in: "X9";  variables (2): X2, X9
> correctness rate: 0.69583;  in: "X27";  variables (3): X2, X9, X27
>
>   hr.elapsed min.elapsed sec.elapsed
>         0.00        0.00       20.77
>
>> f4.U.spDA<- stepclass(f.mes, f.gp4,
> "lda",improvement=0.01,prior=rep(0.25,4))
>   `stepwise classification', using 10-fold cross-validated correctness
> rate of method lda'.
> 89 observations of 31 variables in 4 classes; direction: both
> stop criterion: improvement less than 1%.
> correctness rate: 0.60556;  in: "X2";  variables (1): X2
> correctness rate: 0.71806;  in: "X6";  variables (2): X2, X6
>
>   hr.elapsed min.elapsed sec.elapsed
>         0.00        0.00       15.14
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list