[R] training svm

Max Kuhn mxkuhn at gmail.com
Fri Mar 7 21:26:29 CET 2008


Also, see the nearZeroVar function in the caret package.

MAx

On Fri, Mar 7, 2008 at 7:41 AM, Charilaos Skiadas <cskiadas at gmail.com> wrote:
>
> On Mar 7, 2008, at 2:17 AM, Oldrich Kruza wrote:
>
>  > Hello Soumyadeep,
>  >
>  > if you store the data in a tabular file, then I suggest using standard
>  > text-editing tools like cut (say your file is called data.csv, fields
>  > are separated with commas and you want to get rid of the third and
>  > sixth column):
>  >
>  > $ cut --complement --delimiter="," --fields=3,6 < data.csv >
>  > data_cut.csv
>  >
>  > If you're not in an Unix environment but have perl, then you may use a
>  > script like:
>  >
>  >  open SRC, "data.csv" or die("couldn't open source");
>  >  open DST, ">data_cut.csv" or die("couldn't open destination");
>  >  while (<SRC>) {
>  >      chomp;
>  >      @fields = split /,/;    #substitute the comma for the
>  > delimiter you use
>  >      splice @fields, 2, 1;    #get rid of third column (they're
>  > zero-based, thus 2 instead of 3)
>  >      splice @fields, 5, 1;    #get rid of sixth column
>  >      print DST join(",", @fields), "\n";
>  >  }
>  >
>  > If you need to do the selection within R, then you can do it by
>  > indexing the data structure. Suppose you have the data in a data.frame
>  > called data. Then:
>  >
>  >> data <- data[,-6]
>  >> data <- data[,-3]
>  >
>  > might do the trick (but since I'm not much of an R hacker, this is
>  > without guarantee). I think it might be better however to do the
>  > preprocessing before the data get into R because then you avoid
>  > loading the columns to discard into memory.
>
>  I am guessing that the data is already in R, so it should be easier
>  to do it in R, especially if he doesn't know which columns are the
>  ones with all identical values. For instance, suppose the data set is
>  called x. Then the following would return TRUE for the columns that
>  have all values the same:
>
>  allsame <- sapply(x,function(y) length(table(y))==1)
>
>  and then the following will take them out
>
>  newdata <- x[,!allsame]
>
>  > Hope this helps
>  > ~ Oldrich
>
>  Haris Skiadas
>  Department of Mathematics and Computer Science
>  Hanover College
>
>
>
>  ______________________________________________
>  R-help at r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>



-- 

Max



More information about the R-help mailing list