[BioC] Classification

Fri Jun 24 19:26:50 CEST 2011

The standard MASS package includes the "polr" function to perform 
ordinal regression.  After running polr to fit the base model with all 
parameters, you can pass the results throught the "step" function to use 
AIC to select the best set of predictors.

     Kevin

On 6/24/2011 10:38 AM, Tim Triche, Jr. wrote:
> You have an ordinal response, so you might consider an ordered probit model
> with interaction terms and a penalized likelihood fit, and determine the
> best penalty by cross-validation.  I don't recall whether CMA supports
> ordered probit models, but it's probably the best approach, and you could
> just brute-force it -- you've only got 120 different models to fit under
> this scheme.  At the very least, CMA would generate the cross-validation
> sets for you.
>
> You might also want to consider recursively fitting a shrunken LDA model
> (diseased/healthy, moderate/severe) and see how that compares to an ordinal
> model.  Regardless, cross-validation is the obvious answer to how to pick
> one.
>
> Hope this helps,
> -t
>
> On Fri, Jun 24, 2011 at 8:24 AM, David martin<vilanew at gmail.com>  wrote:
>
>> thanks.
>> Is not binary since i have three categories and 5 genes. I have tried LDA
>> and stepclass
>>
>> #LDR stepwise
>> disc<-stepclass(Group~ ., data =dataf, method = "lda",improvement = 0.001)
>>
>> where group contains my three categories ("healthy","moderate disease",
>> "severe disease") and dataf the pcr values for my 5 genes.
>>
>> The problem i have is that stepwise generates a different signature each
>> time (as it randomly picks up a gene to start with)? This is ok for me but
>> how many times do you need to run stepclass so that you found your mopst
>> probable genes that classify your groups , Do i need to do a loop for
>> stepclass ???
>>
>> thanks
>>
>>
>>
>> On 06/24/2011 05:17 PM, Kevin R. Coombes wrote:
>>
>>> .. and probably should ...
>>>
>>> For a binary classification with only a few predictors, you can, for
>>> example, use logistic regression with some standard criterion like AIC,
>>> BIC, or Bayesian model averaging to decide which predictors should be
>>> retained.
>>>
>>> Kevin
>>>
>>> On 6/23/2011 6:10 PM, Moshe Olshansky wrote:
>>>
>>>> If you have just 5 genes and a decent number of samples you can use
>>>> any of
>>>> the "conventional" (i.e. not high throughput) methods like LDA, trees,
>>>> Random Forest, SVM, etc.
>>>>
>>>>   I will have a look at both packages. It's pcr data by the way
>>>>> thanks
>>>>>
>>>>> On 06/23/2011 05:56 PM, Tim Triche, Jr. wrote:
>>>>>
>>>>>> or CMA, which is perhaps a more systematic approach for classification.
>>>>>> (the package name stands for Classification of MicroArrays) Very well
>>>>>> thought out.
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 23, 2011 at 8:02 AM, Sean
>>>>>> Davis<sdavis2 at mail.nih.gov>
>>>>>> wrote:
>>>>>>
>>>>>>   On Thu, Jun 23, 2011 at 10:58 AM, David
>>>>>>> martin<vilanew at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I have 5 genes of interest. I would like to know which combination(s)
>>>>>>>> of
>>>>>>>> genes gives the best disease separation. Which test could i use in my
>>>>>>>> training set to see which combination is the best classificer between
>>>>>>>> my
>>>>>>>> disease and my healthy population.
>>>>>>>>
>>>>>>>> Thanks for any comment or test that could be useful to answer that
>>>>>>>>
>>>>>>> question.
>>>>>>>
>>>>>>> Check out the MLInterfaces package. It should give you some ideas on
>>>>>>> where to start.
>>>>>>>
>>>>>>> Sean
>>>>>>>
>>>>>>> ______________________________**_________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>>>>
>>>>>>>
>>>>>>   ______________________________**_________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>>
>>>>>
>>>> ______________________________**______________________________**
>>>> __________
>>>> The information in this email is confidential and intend...{{dropped:4}}
>>>>
>>>> ______________________________**_________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>>
>>> ______________________________**_________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>>> Search the archives:
>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>>
>>>
>> ______________________________**_________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https://stat.ethz.ch/mailman/listinfo/bioconductor>
>> Search the archives: http://news.gmane.org/gmane.**
>> science.biology.informatics.**conductor<http://news.gmane.org/gmane.science.biology.informatics.conductor>
>>
>
>