[BioC] Fwd: How to decide which distance metric to use for micoarray data clustering?

Peng Yu pengyu.ut at gmail.com
Wed Oct 7 17:54:08 CEST 2009


On Wed, Oct 7, 2009 at 10:04 AM, Sean Davis <seandavi at gmail.com> wrote:
>
>
> On Wed, Oct 7, 2009 at 10:49 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>
>> Besides the distance metrics, there are other things that may also be
>> important. For example, multiple probesets map to a same gene. I can
>> do clustering on probeset values or on averaged probeset values of
>> genes. What factors should I consider when I make this kind of
>> decisions?
>>
>
> It is generally best not to average probes.  You could choose one to be
> representative of each gene, but averaging is not the best way to go.

Is there any justification why it is not good to average probes?

>> bioDist says something about two popular metrics, but the description
>> is distilled. I am wondering whether there are some more detailed
>> comparisons between metrics.
>
> Often, the metrics produce highly compatible pictures of the data.  The
> actual metric you will use may be directed somewhat by the goals of the
> analysis but, at least for hierarchical clustering, I think it is difficult
> to argue for one "best" or "recommended" metric.
>
> In practice, you may want to try a few to see how they behave on your data.

If the results by different metrics are different, how to do decide
which one I should use?

>> On Wed, Oct 7, 2009 at 12:35 AM, Tim Triche <tim.triche at gmail.com> wrote:
>> > look at the bioDist package for some suggestions.
>> >
>> > the metric to use depends on your task.
>> >
>> >
>> > On Tue, Oct 6, 2009 at 8:52 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I am looking for the most appropriate distance metrics for the
>> >> clustering of a set of microarray data. And I read Chapter 12 of
>> >> Bioinformatics and Computational Biology Solutions Using R and
>> >> Bioconductor, But I'm still not clear what the general guide line is
>> >> to choose an appropriate distance metrics out of many ones list in
>> >> that chapter. Could somebody let me know how to choose an appropriate
>> >> distance metrics?
>> >>
>> >> Regards,
>> >> Peng
>> >>
>> >> _______________________________________________
>> >> Bioconductor mailing list
>> >> Bioconductor at stat.math.ethz.ch
>> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >> Search the archives:
>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>> >
>> >
>> > --
>> > Statisticians, like artists, have a bad habit of falling in love with
>> > their
>> > models.
>> > --George Box
>> >
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list