[BioC] package of predicting a continuous variable from more than one continuous predictor variables

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Sep 9 16:45:32 CEST 2009


Hi,

On Sep 9, 2009, at 10:38 AM, shirley zhang wrote:

> Hi Steve,
>
> Thanks for your explanation and suggestions. I don't know SVM can also
> be used for regression since I only used it for classification.

Yeah, no problem. It's pretty straightforward to wire up an SVM for  
regression -- you'll have run it a few times with different values of  
"epsilon" (like you would for the the C (or nu) in svm-classification).

If you're interested in some details/theory, here's a "brief tutorial"  
on support vector regression by Alex Smola and Bernhard Scholkopf:
http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf

Let us know if you need help (but maybe R-help might be more  
appropriate?).

> I will try those methods you suggested. Do you have any experience  
> with CART?

Nope, I've never used CART before, sorry.

-steve

>
> Thanks again,
> Shirley
>
> On Wed, Sep 9, 2009 at 10:26 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi Shirley,
>>
>> On Sep 9, 2009, at 10:10 AM, shirley zhang wrote:
>>
>>> Thanks Steve.
>>>
>>> Sorry that I did not make myself clear. I am trying to build a
>>> biomarker from gene expression microarray data. What I am doing is
>>> similar to the weighted-voting algorithm or SVM. But the  
>>> difference is
>>> that the outcome is a continuous variable instead of a categorical
>>> variable.  It is a regression problem, but I want to know which
>>> package is best for this purpose? How about CART?
>>
>> I don't know if there's such thing as "best"(?) What yard stick  
>> would you
>> use to measure that?
>>
>> For instance, you mention "it" is similar to an svm (how?), but  
>> SVM's can
>> also be used for regression, not just classification (doable from  
>> both e1071
>> and kernlab). How about going that route? As usual, interpretation  
>> of the
>> model might be challenging, though (which might be why you're  
>> avoiding it
>> for biomarker discovery?)
>>
>> You also mention weighted-voting:
>>
>>  * how about boosted regression models?
>>     http://cran.r-project.org/web/packages/gbm/index.html
>>
>>  * Also related to boosting: bagging & randomForests (both can be  
>> used for
>> regression):
>>     http://cran.r-project.org/web/packages/randomForest/index.html
>>     http://cran.r-project.org/web/packages/ipred/index.html
>>
>> I think boosting/bagging/random-forests tend to lead to more  
>> interpretable
>> models, so maybe that's better for you?
>>
>> There are also several penalized regression packages (also good for
>> interpretability) for instance glmnet is great:
>> http://cran.r-project.org/web/packages/glmnet/index.html
>>
>> Maybe you have some info about the grouping of your predictors? Try  
>> grouped
>> lasso:
>> http://cran.r-project.org/web/packages/grplasso/index.html
>>
>>
>> -steve
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  |  Memorial Sloan-Kettering Cancer Center
>>  |  Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>>

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list