[BioC] scholarly reference for "don't draw PCA/heatmap dendrograms on DEGs"
lwaldron.research at gmail.com
Mon Dec 9 16:56:33 CET 2013
These papers don't show clustered heatmaps, but show the inflation of
classification accuracy and survival discrimination in simulated
no-signal data when using differentially expressed genes only. So if
you consider your clustering as the classifier, they may be relevant:
Simon RM, Subramanian J, Li M-C, Menezes S. Using cross-validation to
evaluate predictive accuracy of survival risk classifiers based on
high-dimensional data. Brief Bioinform. 2011 May 15;12(3):203–14.
Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of
DNA microarray data for diagnostic and prognostic classification. J
Natl Cancer Inst. 2003 Jan 1;95(1):14–8.
On Mon, Dec 9, 2013 at 9:05 AM, Kevin Coombes <kevin.r.coombes at gmail.com> wrote:
> I don't have a good reference either.
> But you can easily simulate matrices full of IID standard normal data, pick
> the "most differentially expressed" and show that this noise/nonsense
> perfectly separates any two "groups" that you want to pretend is present in
> the data.
> -- Kevin
> On 12/9/2013 8:55 AM, Lorena Pantano wrote:
>> I don't have any reference to give you.
>> But my experience says that you don't get necessary a good heatmap
>> separated by two conditions although you use only DE genes. Probably
>> because many time,s results from DE genes are not so strong to separate
>> two groups, or because there is a systematically outlier in your
>> and get DE genes that are not true, or any other reason.
>> I can say that I have done more than 50 DE analysis, and only once, I got
>> clear heatmap showing two groups. So, I guess there is something there.
>> very interesting your initiative.
>> On Mon, Dec 9, 2013 at 2:19 PM, Aaron Mackey <ajmackey at gmail.com> wrote:
>>> A colleague of mine is skeptical of my assertion that drawing
>>> PCA plots and/or clustered heatmaps based only on differentially
>>> genes (DEGs) is a circular, self-fulfilling prophecy -- they assert that
>>> there's no guarantee samples will cluster by condition (despite the fact
>>> that the condition is exactly what drives selection of DEGs), and so
>>> to use the observed clustering as further "evidence" of the condition
>>> effects. Rather than spend more time trying to explain statistical
>>> concepts, I was hoping to checkmate the argument with a nice Nature
>>> review or somesuch. Any pointers?
>>> Thanks in advance,
>>> [[alternative HTML version deleted]]
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> Search the archives:
>> [[alternative HTML version deleted]]
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> Search the archives:
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives:
More information about the Bioconductor