[BioC] Support vector regression
guest at bioconductor.org
Mon Mar 31 18:06:57 CEST 2014
For convenience sake, I use the example data to ask the question. I use QSAR.XLS [http://eric.univ-lyon2.fr/~ricco/tanagra/fichiers/qsar.zip]
Considering the donors from the dataset as predictor variables and Activity as the resposne variable, I would like to do a support vector regression using both linear and non-linear kernels.
In my case, I would like to find which of the predictors (out of the 20 donors) best explain the activity (response) and did the following:
fit <- svm(activity ~ ., data=qsar,kernel='linear',type="eps-regression")
svm(formula = activity ~ ., data = qsar, kernel = "linear", type = "eps-regression")
Number of Support Vectors: 66
How to determine now which are the best predictors (out of the 20) which explain the activity and get the R-squared values ? And if I try several kernels, is it possible to represent the results in the following way. Below figure is an SVR regression example obtained from python and thought that the comparison of the model will be good this way. I found this from the link here http://scikit-learn.org/0.11/auto_examples/svm/plot_svm_regression.html.
I found several good tutorials for classification but for regression not, so, I tried to follow the tutorial from from http://eric.univ-lyon2.fr/~ricco/tanagra/fichiers /en_Tanagra_Support_Vector_Regression.pdf but did not understand very well.
Could anyone please explain me how this is to be done?
-- output of sessionInfo():
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
 LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
 LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
 stats graphics grDevices utils datasets methods base
other attached packages:
 kernlab_0.9-19 e1071_1.6-3
loaded via a namespace (and not attached):
 class_7.3-9 tools_3.0.3
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor