[R] training svm

Oldrich Kruza sixtease at gmail.com
Fri Mar 7 08:17:31 CET 2008


Hello Soumyadeep,

if you store the data in a tabular file, then I suggest using standard
text-editing tools like cut (say your file is called data.csv, fields
are separated with commas and you want to get rid of the third and
sixth column):

$ cut --complement --delimiter="," --fields=3,6 < data.csv > data_cut.csv

If you're not in an Unix environment but have perl, then you may use a
script like:

 open SRC, "data.csv" or die("couldn't open source");
 open DST, ">data_cut.csv" or die("couldn't open destination");
 while (<SRC>) {
     chomp;
     @fields = split /,/;    #substitute the comma for the delimiter you use
     splice @fields, 2, 1;    #get rid of third column (they're
zero-based, thus 2 instead of 3)
     splice @fields, 5, 1;    #get rid of sixth column
     print DST join(",", @fields), "\n";
 }

If you need to do the selection within R, then you can do it by
indexing the data structure. Suppose you have the data in a data.frame
called data. Then:

> data <- data[,-6]
> data <- data[,-3]

might do the trick (but since I'm not much of an R hacker, this is
without guarantee). I think it might be better however to do the
preprocessing before the data get into R because then you avoid
loading the columns to discard into memory.

Hope this helps
~ Oldrich

On Fri, Mar 7, 2008 at 7:55 AM, Soumyadeep nandi
<soumyadeep_nandi at yahoo.com> wrote:
> Thanks Oldrich,
>  Actually I was not sure if I can remove these columns and build model.
> Thanks a lot for your kind suggestion. Could you tell me if there any
> function to remove these columns from the data matrix.
>
>  With best regards,
>  Soumyadeep
>
>
> Oldrich Kruza <sixtease at gmail.com> wrote:
>  A rather technical workaround I see could be adding a row with a
> different value. But if a column only ever has one value, then it
> contributes nothing to the model and I see no reason why it would have
> to be kept.
> ~ Oldrich Kruza
>
> On Fri, Mar 7, 2008 at 6:45 AM, Soumyadeep nandi
>  wrote:
> > What should I do if I need to train svm() with data having same value
> across
> > all rows in some columns. These must be the important features of the
> class
> > and we cant exclude these columns to build up models.
> >
> > The error I am getting is:
> > Error in predict.svm(ret, xhold) : Model is empty!
> > In addition: Warning message:
> > In svm.default(datatrain, classtrain) :
> > Variable(s) 'F112' and 'F113'.... [... truncated]
> >
> > Is there any way to overcome this problem? Any suggestions would be highly
> > helpful.
> >
> > Regards
> > Soumyadeep
> >
> >
> > ________________________________
> > Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it
> > now.
>
>
>
>  ________________________________
> Looking for last minute shopping deals? Find them fast with Yahoo! Search.



More information about the R-help mailing list