[BioC] RandomForest, supervised machine learning and uncertainty

January Weiner january.weiner at mpiib-berlin.mpg.de
Wed Dec 8 14:01:37 CET 2010


Thank you, Vincent, for the answer.

> task, but if I read you correctly you are addressing the extension of
> the decision task from two classes to two classes plus "doubt".  This

Yes; although I do have more than two classes, and I would like to
stick to random forests. Say, extend the RF decision task from N
classes to N + 1 classes. The problem has been well described in the
discussion on "safety threshold" in the Ripley book.

The simple solution is to define a "doubt function" d on the votes
matrix from the RF such as the one that I have mentioned, and then
plot the size of "doubt class" and the error rate in the remaining
classes against d. That would help making a decision or would actually
count as a result for my study.


@Sean Davis:

> I'll just add here that when thinking about biomarker selection and clinical prediction,
> one must be aware of the often imbalanced costs (to the patient) of misclassification
> (which could include the "unclassified" cases), depending on the actual details of
> the clinical scenario.

This is precisely why I would like to consider the "doubt class". The
costs of having an unclassified result are definitely different (and
most likely lower) than the costs of false negative.

Cheers,
j.



> is discussed at some length in Ripley's "Pattern Recognition and
> Neural Networks" book; see the comments on the "error-reject" curve on
> p20 and on "safety threshold" concept on p22.
>
> The MLInterfaces vignette has an application (that, as written, turns
> out to be nugatory) just at the end of the vignette -- the doubt
> interval is too narrow to capture any classification for the data in
> use.  If you change the code to
>
> douPred[smallDou(0.35, 0.65)] <- "doubt"
>
> one prediction is converted to "doubt".  This issue deserves more attention.
>
>
>>
>> Best regards,
>>
>> j.
>>
>> --
>> -------- Dr. January Weiner 3 --------------------------------------
>> Max Planck Institute for Infection Biology
>> Charitéplatz 1
>> D-10117 Berlin, Germany
>> Web   : www.mpiib-berlin.mpg.de
>> Tel     : +49-30-28460514
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>



-- 
-------- Dr. January Weiner 3 --------------------------------------
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin, Germany
Web   : www.mpiib-berlin.mpg.de
Tel     : +49-30-28460514



More information about the Bioconductor mailing list