[BioC] package of predicting a continuous variable from more than one continuous predictor variables

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Sep 9 16:26:58 CEST 2009

Hi Shirley,

On Sep 9, 2009, at 10:10 AM, shirley zhang wrote:

> Thanks Steve.
> Sorry that I did not make myself clear. I am trying to build a
> biomarker from gene expression microarray data. What I am doing is
> similar to the weighted-voting algorithm or SVM. But the difference is
> that the outcome is a continuous variable instead of a categorical
> variable.  It is a regression problem, but I want to know which
> package is best for this purpose? How about CART?

I don't know if there's such thing as "best"(?) What yard stick would  
you use to measure that?

For instance, you mention "it" is similar to an svm (how?), but SVM's  
can also be used for regression, not just classification (doable from  
both e1071 and kernlab). How about going that route? As usual,  
interpretation of the model might be challenging, though (which might  
be why you're avoiding it for biomarker discovery?)

You also mention weighted-voting:

   * how about boosted regression models?

   * Also related to boosting: bagging & randomForests (both can be  
used for regression):

I think boosting/bagging/random-forests tend to lead to more  
interpretable models, so maybe that's better for you?

There are also several penalized regression packages (also good for  
interpretability) for instance glmnet is great:

Maybe you have some info about the grouping of your predictors? Try  
grouped lasso:


Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

More information about the Bioconductor mailing list