[R] Recursive Feature Elimination with SVM

Bert Gunter bgunter@4567 @ending from gm@il@com
Wed Jan 2 17:18:13 CET 2019


Note: **NOT** reproducible (only you have "data.csv").

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jan 1, 2019 at 11:14 PM Priyanka Purkayastha <
ppurkayastha2010 using gmail.com> wrote:

> This is the code I tried,
>
> library(e1071)
> library(caret)
> library(ROCR)
>
> data <- read.csv("data.csv", header = TRUE)
> set.seed(998)
>
> inTraining <- createDataPartition(data$Class, p = .70, list = FALSE)
> training <- data[ inTraining,]
> testing  <- data[-inTraining,]
>
> while(length(data)>0){
>
> ## Building the model ####
> svm.model <- svm(Class ~ ., data = training,
>
> cross=10,metric="ROC",type="eps-regression",kernel="linear",na.action=na.omit,probability
> = TRUE)
> print(svm.model)
>
>
> ###### auc  measure #######
>
> #prediction and ROC
> svm.model$index
> svm.pred <- predict(svm.model, testing, probability = TRUE)
>
> #calculating auc
> c <- as.numeric(svm.pred)
> c = c - 1
> pred <- prediction(c, testing$Class)
> perf <- performance(pred,"tpr","fpr")
> plot(perf,fpr.stop=0.1)
> auc <- performance(pred, measure = "auc")
> auc <- auc using y.values[[1]]
> print(length(data))
> print(auc)
>
> #compute the weight vector
> w = t(svm.model$coefs)%*%svm.model$SV
>
> #compute ranking criteria
> weight_matrix = w * w
>
> #rank the features
> w_transpose <- t(weight_matrix)
> w2 <- as.matrix(w_transpose[order(w_transpose[,1], decreasing = FALSE),])
> a <- as.matrix(w2[which(w2 == max(w2)),]) #to get the rows with minimum
> values
> row.names(a) -> remove
> training<- data[,setdiff(colnames(data),remove)]
> }
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Jan 2, 2019 at 11:18 AM David Winsemius <dwinsemius using comcast.net>
> wrote:
>
> >
> > On 1/1/19 5:31 PM, Priyanka Purkayastha wrote:
> > > Thankyou David.. I tried the same, I gave x as the data matrix and y
> > > as the class label. But it returned an empty "featureRankedList". I
> > > get no output when I try the code.
> >
> >
> > If you want people to spend time on this you should post a reproducible
> > example. See the Posting Guide ... and learn to post in plain text.
> >
> >
> > --
> >
> > David
> >
> > >
> > > On Tue, 1 Jan 2019 at 11:42 PM, David Winsemius
> > > <dwinsemius using comcast.net <mailto:dwinsemius using comcast.net>> wrote:
> > >
> > >
> > >     On 1/1/19 4:40 AM, Priyanka Purkayastha wrote:
> > >     > I have a dataset (data) with 700 rows and 7000 columns. I am
> > >     trying to do
> > >     > recursive feature selection with the SVM model. A quick google
> > >     search
> > >     > helped me get a code for a recursive search with SVM. However, I
> > >     am unable
> > >     > to understand the first part of the code, How do I introduce my
> > >     dataset in
> > >     > the code?
> > >
> > >
> > >     Generally the "labels" is given to such a machine learning device
> > >     as the
> > >     y argument, while the "features" are passed as a matrix to the x
> > >     argument.
> > >
> > >
> > >     --
> > >
> > >     David.
> > >
> > >     >
> > >     > If the dataset is a matrix, named data. Please give me an
> > >     example for
> > >     > recursive feature selection with SVM. Bellow is the code I got
> for
> > >     > recursive feature search.
> > >     >
> > >     >      svmrfeFeatureRanking = function(x,y){
> > >     >
> > >     >      #Checking for the variables
> > >     >      stopifnot(!is.null(x) == TRUE, !is.null(y) == TRUE)
> > >     >
> > >     >      n = ncol(x)
> > >     >      survivingFeaturesIndexes = seq_len(n)
> > >     >      featureRankedList = vector(length=n)
> > >     >      rankedFeatureIndex = n
> > >     >
> > >     >      while(length(survivingFeaturesIndexes)>0){
> > >     >      #train the support vector machine
> > >     >      svmModel = svm(x[, survivingFeaturesIndexes], y, cost = 10,
> > >     > cachesize=500,
> > >     >                  scale=FALSE, type="C-classification",
> > >     kernel="linear" )
> > >     >
> > >     >      #compute the weight vector
> > >     >      w = t(svmModel$coefs)%*%svmModel$SV
> > >     >
> > >     >      #compute ranking criteria
> > >     >      rankingCriteria = w * w
> > >     >
> > >     >      #rank the features
> > >     >      ranking = sort(rankingCriteria, index.return = TRUE)$ix
> > >     >
> > >     >      #update feature ranked list
> > >     >      featureRankedList[rankedFeatureIndex] =
> > >     > survivingFeaturesIndexes[ranking[1]]
> > >     >      rankedFeatureIndex = rankedFeatureIndex - 1
> > >     >
> > >     >      #eliminate the feature with smallest ranking criterion
> > >     >      (survivingFeaturesIndexes =
> > >     survivingFeaturesIndexes[-ranking[1]])}
> > >     >      return (featureRankedList)}
> > >     >
> > >     >
> > >     >
> > >     > I tried taking an idea from the above code and incorporate the
> > >     idea in my
> > >     > code as shown below
> > >     >
> > >     >      library(e1071)
> > >     >      library(caret)
> > >     >
> > >     >      data<- read.csv("matrix.csv", header = TRUE)
> > >     >
> > >     >      x <- data
> > >     >      y <- as.factor(data$Class)
> > >     >
> > >     >      svmrfeFeatureRanking = function(x,y){
> > >     >
> > >     >        #Checking for the variables
> > >     >        stopifnot(!is.null(x) == TRUE, !is.null(y) == TRUE)
> > >     >
> > >     >        n = ncol(x)
> > >     >        survivingFeaturesIndexes = seq_len(n)
> > >     >        featureRankedList = vector(length=n)
> > >     >        rankedFeatureIndex = n
> > >     >
> > >     >        while(length(survivingFeaturesIndexes)>0){
> > >     >          #train the support vector machine
> > >     >          svmModel = svm(x[, survivingFeaturesIndexes], y,
> > >     cross=10,cost =
> > >     > 10, type="C-classification", kernel="linear" )
> > >     >
> > >     >          #compute the weight vector
> > >     >          w = t(svmModel$coefs)%*%svmModel$SV
> > >     >
> > >     >          #compute ranking criteria
> > >     >          rankingCriteria = w * w
> > >     >
> > >     >          #rank the features
> > >     >          ranking = sort(rankingCriteria, index.return = TRUE)$ix
> > >     >
> > >     >          #update feature ranked list
> > >     >          featureRankedList[rankedFeatureIndex] =
> > >     > survivingFeaturesIndexes[ranking[1]]
> > >     >          rankedFeatureIndex = rankedFeatureIndex - 1
> > >     >
> > >     >          #eliminate the feature with smallest ranking criterion
> > >     >          (survivingFeaturesIndexes =
> > >     survivingFeaturesIndexes[-ranking[1]])}
> > >     >
> > >     >        return (featureRankedList)}
> > >     >
> > >     > But couldn't do anything at the stage "update feature ranked
> list"
> > >     > Please guide
> > >     >
> > >     >       [[alternative HTML version deleted]]
> > >     >
> > >     > ______________________________________________
> > >     > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
> > >     -- To UNSUBSCRIBE and more, see
> > >     > https://stat.ethz.ch/mailman/listinfo/r-help
> > >     > PLEASE do read the posting guide
> > >     http://www.R-project.org/posting-guide.html
> > >     > and provide commented, minimal, self-contained, reproducible
> code.
> > >
> > > --
> > > Regards,
> > >
> > > Priyanka Purkayastha, M.Tech, Ph.D.,
> > > SERB National Postdoctoral Researcher
> > > Genomics and Systems Biology Lab,
> > > Department of Chemical Engineering,
> > > Indian Institute of Technology Bombay (IITB),
> > > Powai, Mumbai- 400076
> > >
> > >
> > >
> >
>
>
> --
> Regards,
>
> Priyanka Purkayastha, M.Tech, Ph.D.,
> SERB National Postdoctoral Researcher
> Genomics and Systems Biology Lab,
> Department of Chemical Engineering,
> Indian Institute of Technology Bombay (IITB),
> Powai, Mumbai- 400076
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list