[BioC] 回复: 转发: Statistical approach to compare differentiall expressed gene lists

Francois Pepin fpepin at cs.mcgill.ca
Tue Dec 29 17:49:51 CET 2009


Dear Qinghua,

I am not sure if I would call those differences "very impressive". As 
your samples have different numbers of X and Y chromosomes, I would 
definitely expect many of them to be differentially expressed. After 
all, no genes on Y should be expressed on any of the females, right?

The fact that you can trivially predict between them should suggest that 
what you are doing is not difficult at all.

As Wolfgang is saying, you need another criterion to define what is a 
"good list". No statistical tests are going to tell you that one list is 
better than the other.

This being said, one of your list is larger, so it is likely that it 
contains more of the differences between your groups. On the other hand, 
it could be giving you more false positives. You could look at the extra 
genes and see if they make sense. In this case, if a lot of the extra 
genes were on X and Y chromosomes, they are likely truly differentially 
expressed.

Keep in mind that you have a large overlap between the lists, so it will 
be more difficult to choose between them but it also matters much less 
which one you choose.

It would be very convenient if there was a simple test that would tell 
us which method is best for an analysis, but generally no such method exist.

Francois

On 12/29/2009 04:31 AM, qinghua xu wrote:
> Dear Wolfgang,
> Â
> It is really nice and surprise to have your attention! Thank you!
> Â
> I am sorry that the question was too vague. The detailed figure is that we would like to study the gene expression profiling in human peripheral blood and identify DEGs (differential expressed genes) between male and female. As I mentioned in my previous email, the raw data were preprocessed in two approaches: one is simply by RMA and the other, after RMA, the expression data were further adjusted by ComBat  (http://statistics.byu.edu/johnson/ComBat/) to removal potential batch effects. The dataset was relatively small including 12 Male and 12 Female. At the end, we got two DEG lists by SAM at FDR=0.05. The basic idea is to show by removing potential batch effects, we are capable to extract more information from gene expression data representing the difference between male and female in peripheral blood. On the other hand, we also would like to check whether the additional batch effect adjustment will introduce artificial DEGs.
> Â
> Based on the preliminary result, we observe that the difference between male and female in peripheral blood are very impressive, especially for (x, y) chromosome specific genes. Hence, when plotted ROC curves for both methods, both DEG lists easily reached the maximum AUC=1. And the same situation for hierarchical clustering heatmap, both DEG lists achieved perfect discrimination.
> Â
> Thanks again!
> Â
> Qinghua
>
>
>
> ________________________________
> 发件人: Wolfgang Huber<whuber at embl.de>
>
> 抄 送: bioconductor<bioconductor at stat.math.ethz.ch>; qinghua.xu at as.biomerieux.com
> 发送日期: 2009/12/28 (周一) 4:56:18 下午
> 主 题: Re: [BioC] 转发: Statistical approach to compare differentiall expressed gene lists
>
> Dear Qinghua
>
> I am afraid your question may be too vague. You will need to define more
> precisely what you mean by "better". Then, it should be straightforward
> to compute a quantitative criterion. It wouldn't be wise to wait for
> someone else to define what is "better" for you.
>
> Also, for any analysis method I know of, gene lists depend in a trivial
> manner on a cut-off (e.g. for p-value, score...), and if you want to do
> something more meaningful than exegesis of someone's cut-off choice,
> than I'd suggest to plot ROC curves for both methods, using a reference
> set of genes that is enriched for "truely differentially expressed".
>
> Best wishes
> Â Â Â  Wolfgang
>
>
>> Dear all,
>>
>> I have identified two lists of differential expressed gene from the
>> same expression data but treated with different normalisation
>> methods. List A contains 995 genes and list B contains 2400 genes.
>> More than nine hundreds genes are overlapped between two lists,
>> namely most of genes in list A are also included in list B. The idea
>> is to check whether list B is better than list A.
>>
>> In addition to visualisation approach (like hierarchical clustering
>> heatmap) or biological interpretations,  I am wondering is there any
>> other statistical approach available to compare two differential
>> expressed gene lists?
>>
>> I would appreciate any advice, or pointers to any references for
>> this!
>>
>> Bests, Qinghua
>>
>>
>>
>> ___________________________________________________________ å¥½çŽ©è´ºå¡ç­‰ä½ å‘ï¼Œé‚®
>> 箱贺卡全新上线!
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________ Bioconductor mailing
>> list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________ Bioconductor mailing
>> list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list