[R] Stepwise Discriminant... in R

Bill.Venables at csiro.au Bill.Venables at csiro.au
Mon Mar 14 02:00:27 CET 2011


If you want to do a stepwise selection there is a function in the klaR package to do it.  This is not what you are asking for, though.  You want a way of finding the successive error rates as additional variables are added in the forward selection process.  As far as I can see you have to do this yourself and it is a mildly interesting little exercise in R programming.  Here is one possible way to do it.

First you need a couple of functions:

##############
errorRate <- function(object, ...) {
  if(!require(MASS)) stop("you need the MASS package installed")
  UseMethod("errorRate")
}

errorRate.lda <- function(object, data = eval.parent(object$call$data),
                          type = "plug-in") {
  pred <- predict(object, data, type = type)$class
  actu <- eval(formula(object)[[2]], data)
  conf <- table(pred, actu)
  1 - sum(diag(conf))/sum(conf)
}

eRates <- function(object, data = eval.parent(object$call$data),
                   type = "plug-in") {
  f <- formula(object)
  r <- data.frame(formula = deparse(f[[3]]),
                  Error = errorRate(object, data,
                  type = type))
  while(length(f[[3]]) > 1) {
    f[[3]] <- f[[3]][[2]]
    object$call$formula <- f
    object <- update(object, data = data)
    r <- rbind(data.frame(formula = deparse(f[[3]]),
                          Error = errorRate(object, data,
                          type = type)),
               r)
  }
  r
}
##############

(I have made errorRate generic as it is potentially a generic sort of operation.)
Now look at your trivial example (extended a bit):

##############
require(klaR)
QRBdfa <-
    data.frame(LANDUSE = sample(c("A", "B", "C"), 270, rep = TRUE),
               Al = runif(270, 0, 125),
               Sb = runif(270, 0, 1),
               Ba = runif(270, 0, 235),
               Bi = runif(270, 0, 0.11),
               Cr = runif(270, 0, 65))

gw_obj <- greedy.wilks(LANDUSE ~ ., data = QRBdfa, niveau = 1) ## NB large 'niveau'
##############

To use the functions you need an lda fit with the same formula as for the gw object and the same data argument as in the original call.  (If you try to do this the way suggested in the help file for greedy.wilks the functions to be used here will not work. No dollars in formulae is always a good rule to follow.)

The way greedy.wilks is written makes this a bit tricky, but unless you want to just type it in, here is a partly automated way of doing it:

##############
require(MASS)
fit <- do.call(lda, list(formula = formula(gw_obj),
                         data = quote(QRBdfa)))
##############

To use the functions:

> errorRate(fit)  ## for one error rate
[1] 0.5962963
> eRates(fit)     ## for a sequence of error rates.
                 formula     Error
1                     Ba 0.6148148
2                Ba + Bi 0.6296296
3           Ba + Bi + Al 0.6074074
4      Ba + Bi + Al + Cr 0.5740741
5 Ba + Bi + Al + Cr + Sb 0.5962963
> 

Since this example uses very artificial random data, the output will be different every time you re-create the data.  Note also that the error rates are not necessarily monotonically decreasing.

Bill Venables.



-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Ty Smith
Sent: Monday, 14 March 2011 3:51 AM
To: r-help at r-project.org
Subject: [R] Stepwise Discriminant... in R

Hello R list,

I'm looking to do some stepwise discriminant function analysis (DFA) based
on the minimization of Wilks' lambda in R to end up with a composite
signature (of metals "Al","Sb","Bi","Cr","Ba") capable of discriminating
100% of the source factors (LANDUSE: "A","B","C").

The Wilks' lambda portion seems straightforward. I am using the following:

gw_obj <- greedy.wilks(LANDUSE ~ ., data = QRBdfa, niveau = 0.1)
gw_obj

Thus determining the stepwise order of metals.But I can't seem to figure out
how to coerce the DFA to give me an output with the % of factors which each
successive metal (variable) correctly classifies (discriminates). e.g.

Step    Metal        %correctly classified
1            Al                25
2            Sb               75
3            Bi                89
4            Cr               100

I've worked up a trivial example below. Can anyone offer any suggestions on
how I might go about doing this in R?

I am working in a MAC OS environment with a current version of R.

Many thanks in advance!

Tyler

#Example
library(scatterplot3d)
library(klaR)

Al <-runif(27, 0, 125)
QRBdfa <- as.data.frame(Al)
QRBdfa$LANDUSE <- factor(c("A","A","A","B","B","B","C","C","C"))
QRBdfa$Sb <- runif(27, 0, 1)
QRBdfa$Ba <- runif(27, 0, 235)
QRBdfa$Bi <- runif(27, 0, 0.11)
QRBdfa$Cr <- runif(27, 0, 65)


gw_obj <- greedy.wilks(LANDUSE ~ ., data = QRBdfa, niveau = 0.1)
gw_obj


fit <- lda(LANDUSE ~ Al + Sb + Bi + Cr + Ba, data = QRBdfa)

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list