[BioC] package of predicting a continuous variable from more than one continuous predictor variables
Zeljko Debeljak
zeljko.debeljak at gmail.com
Thu Sep 10 14:49:19 CEST 2009
Dear Shirley,
have you tried to use random forests for the described task?
Prediction quality is similar to SVM prediction quality while there is
(almost) no need for method parameters adjustment. Predefined values
of these parameters in the random forests case could be used for a
variety of different problems without adjustment, subseting etc.
Unfortunately, in SVM case you need to optimize/train at least 2-3
parameters. This task is time consuming, and it could lead to severe
overfitting problems. You could also try to use the backpropagation
neural networks but this problem is even more pronounced in this case.
Zeljko Debeljak, PhD
CROATIA
2009/9/9, shirley zhang <shirley0818 at gmail.com>:
> Thanks a lot. Shirley
>
> On Wed, Sep 9, 2009 at 10:45 AM, Steve Lianoglou
> <mailinglist.honeypot at gmail.com> wrote:
>> Hi,
>>
>> On Sep 9, 2009, at 10:38 AM, shirley zhang wrote:
>>
>>> Hi Steve,
>>>
>>> Thanks for your explanation and suggestions. I don't know SVM can also
>>> be used for regression since I only used it for classification.
>>
>> Yeah, no problem. It's pretty straightforward to wire up an SVM for
>> regression -- you'll have run it a few times with different values of
>> "epsilon" (like you would for the the C (or nu) in svm-classification).
>>
>> If you're interested in some details/theory, here's a "brief tutorial" on
>> support vector regression by Alex Smola and Bernhard Scholkopf:
>> http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf
>>
>> Let us know if you need help (but maybe R-help might be more
>> appropriate?).
>>
>>> I will try those methods you suggested. Do you have any experience with
>>> CART?
>>
>> Nope, I've never used CART before, sorry.
>>
>> -steve
>>
>>>
>>> Thanks again,
>>> Shirley
>>>
>>> On Wed, Sep 9, 2009 at 10:26 AM, Steve Lianoglou
>>> <mailinglist.honeypot at gmail.com> wrote:
>>>>
>>>> Hi Shirley,
>>>>
>>>> On Sep 9, 2009, at 10:10 AM, shirley zhang wrote:
>>>>
>>>>> Thanks Steve.
>>>>>
>>>>> Sorry that I did not make myself clear. I am trying to build a
>>>>> biomarker from gene expression microarray data. What I am doing is
>>>>> similar to the weighted-voting algorithm or SVM. But the difference is
>>>>> that the outcome is a continuous variable instead of a categorical
>>>>> variable. It is a regression problem, but I want to know which
>>>>> package is best for this purpose? How about CART?
>>>>
>>>> I don't know if there's such thing as "best"(?) What yard stick would
>>>> you
>>>> use to measure that?
>>>>
>>>> For instance, you mention "it" is similar to an svm (how?), but SVM's
>>>> can
>>>> also be used for regression, not just classification (doable from both
>>>> e1071
>>>> and kernlab). How about going that route? As usual, interpretation of
>>>> the
>>>> model might be challenging, though (which might be why you're avoiding
>>>> it
>>>> for biomarker discovery?)
>>>>
>>>> You also mention weighted-voting:
>>>>
>>>> * how about boosted regression models?
>>>> http://cran.r-project.org/web/packages/gbm/index.html
>>>>
>>>> * Also related to boosting: bagging & randomForests (both can be used
>>>> for
>>>> regression):
>>>> http://cran.r-project.org/web/packages/randomForest/index.html
>>>> http://cran.r-project.org/web/packages/ipred/index.html
>>>>
>>>> I think boosting/bagging/random-forests tend to lead to more
>>>> interpretable
>>>> models, so maybe that's better for you?
>>>>
>>>> There are also several penalized regression packages (also good for
>>>> interpretability) for instance glmnet is great:
>>>> http://cran.r-project.org/web/packages/glmnet/index.html
>>>>
>>>> Maybe you have some info about the grouping of your predictors? Try
>>>> grouped
>>>> lasso:
>>>> http://cran.r-project.org/web/packages/grplasso/index.html
>>>>
>>>>
>>>> -steve
>>>>
>>>> --
>>>> Steve Lianoglou
>>>> Graduate Student: Computational Systems Biology
>>>> | Memorial Sloan-Kettering Cancer Center
>>>> | Weill Medical College of Cornell University
>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>
>>>>
>>
>> --
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>> | Memorial Sloan-Kettering Cancer Center
>> | Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>
>>
>
>
>
> --
> Xiaoling (Shirley) Zhang
>
> Ph.D. Candidate in Bioinformatics
> Boston University, Boston, MA
> Tel: (857) 233-9862
> Email: zhangxl at bu.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list