[R] Nominal variables in SVM?

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Aug 12 22:55:09 CEST 2009


Hi,

On Aug 12, 2009, at 2:53 PM, Noah Silverman wrote:

> Hi,
>
> The answers to my previous question about nominal variables has lead  
> me to a more important question.
>
> What is the "best practice" way to feed nominal variable to an SVM.
>
> For example:
> color = ("red, "blue", "green")
>
> I could translate that into an index so I wind up with
> color= (1,2,3)
>
> But my concern is that the SVM will now think that the values are  
> numeric in "range" and not discrete conditions.
>
> Another thought would be to create 3 binary variables from the  
> single color variable, so I have:
>
> red = (0,1)
> blue = (0,1)
> green = (0,1)
>
> A example fed to the SVM would have one positive and two negative  
> values to indicate the color value:
> i.e. for a blue example:
> red = 0, blue =1 , green = 0

Do it this way.

So, imagine if the features for your examples were color and height,  
your "feature matrix" for N examples would be N x 4

0,1,0,15  # blue object, height 15
1,0,0,10  # red object, height 10
0,0,1,5 # green object, height 5
...

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




More information about the R-help mailing list