[BioC] package of predicting a continuous variable from more than one continuous predictor variables
mailinglist.honeypot at gmail.com
Wed Sep 9 16:26:58 CEST 2009
On Sep 9, 2009, at 10:10 AM, shirley zhang wrote:
> Thanks Steve.
> Sorry that I did not make myself clear. I am trying to build a
> biomarker from gene expression microarray data. What I am doing is
> similar to the weighted-voting algorithm or SVM. But the difference is
> that the outcome is a continuous variable instead of a categorical
> variable. It is a regression problem, but I want to know which
> package is best for this purpose? How about CART?
I don't know if there's such thing as "best"(?) What yard stick would
you use to measure that?
For instance, you mention "it" is similar to an svm (how?), but SVM's
can also be used for regression, not just classification (doable from
both e1071 and kernlab). How about going that route? As usual,
interpretation of the model might be challenging, though (which might
be why you're avoiding it for biomarker discovery?)
You also mention weighted-voting:
* how about boosted regression models?
* Also related to boosting: bagging & randomForests (both can be
used for regression):
I think boosting/bagging/random-forests tend to lead to more
interpretable models, so maybe that's better for you?
There are also several penalized regression packages (also good for
interpretability) for instance glmnet is great:
Maybe you have some info about the grouping of your predictors? Try
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor