[BioC] package of predicting a continuous variable from more than one continuous predictor variables

Wed Sep 9 17:57:38 CEST 2009

Thanks a lot.  Shirley

On Wed, Sep 9, 2009 at 10:45 AM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Sep 9, 2009, at 10:38 AM, shirley zhang wrote:
>
>> Hi Steve,
>>
>> Thanks for your explanation and suggestions. I don't know SVM can also
>> be used for regression since I only used it for classification.
>
> Yeah, no problem. It's pretty straightforward to wire up an SVM for
> regression -- you'll have run it a few times with different values of
> "epsilon" (like you would for the the C (or nu) in svm-classification).
>
> If you're interested in some details/theory, here's a "brief tutorial" on
> support vector regression by Alex Smola and Bernhard Scholkopf:
> http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf
>
> Let us know if you need help (but maybe R-help might be more appropriate?).
>
>> I will try those methods you suggested. Do you have any experience with
>> CART?
>
> Nope, I've never used CART before, sorry.
>
> -steve
>
>>
>> Thanks again,
>> Shirley
>>
>> On Wed, Sep 9, 2009 at 10:26 AM, Steve Lianoglou
>> <mailinglist.honeypot at gmail.com> wrote:
>>>
>>> Hi Shirley,
>>>
>>> On Sep 9, 2009, at 10:10 AM, shirley zhang wrote:
>>>
>>>> Thanks Steve.
>>>>
>>>> Sorry that I did not make myself clear. I am trying to build a
>>>> biomarker from gene expression microarray data. What I am doing is
>>>> similar to the weighted-voting algorithm or SVM. But the difference is
>>>> that the outcome is a continuous variable instead of a categorical
>>>> variable.  It is a regression problem, but I want to know which
>>>> package is best for this purpose? How about CART?
>>>
>>> I don't know if there's such thing as "best"(?) What yard stick would you
>>> use to measure that?
>>>
>>> For instance, you mention "it" is similar to an svm (how?), but SVM's can
>>> also be used for regression, not just classification (doable from both
>>> e1071
>>> and kernlab). How about going that route? As usual, interpretation of the
>>> model might be challenging, though (which might be why you're avoiding it
>>> for biomarker discovery?)
>>>
>>> You also mention weighted-voting:
>>>
>>>  * how about boosted regression models?
>>>    http://cran.r-project.org/web/packages/gbm/index.html
>>>
>>>  * Also related to boosting: bagging & randomForests (both can be used
>>> for
>>> regression):
>>>    http://cran.r-project.org/web/packages/randomForest/index.html
>>>    http://cran.r-project.org/web/packages/ipred/index.html
>>>
>>> I think boosting/bagging/random-forests tend to lead to more
>>> interpretable
>>> models, so maybe that's better for you?
>>>
>>> There are also several penalized regression packages (also good for
>>> interpretability) for instance glmnet is great:
>>> http://cran.r-project.org/web/packages/glmnet/index.html
>>>
>>> Maybe you have some info about the grouping of your predictors? Try
>>> grouped
>>> lasso:
>>> http://cran.r-project.org/web/packages/grplasso/index.html
>>>
>>>
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>  |  Memorial Sloan-Kettering Cancer Center
>>>  |  Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>>
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  |  Memorial Sloan-Kettering Cancer Center
>  |  Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>
>

-- 
Xiaoling (Shirley) Zhang

Ph.D. Candidate in Bioinformatics
Boston University, Boston, MA
Tel: (857) 233-9862	
Email: zhangxl at bu.edu