[BioC] edgeR and tagwise dispersion: overcorrection for multiple tests?

Martin Morgan mtmorgan at fhcrc.org
Fri Jul 13 15:21:14 CEST 2012


On 07/12/2012 11:34 PM, Gordon K Smyth wrote:
> Dear Allessandro,
>
> I haven't seen the MDS plots (because attachments are not distributed to
> the list), but don't see anything surprising in what you have reported.

actually, some attachments are (this was a recent realization on our 
part, too!). The posting guide

   http://bioconductor.org/help/mailing-list/posting-guide/

now says "The following attachment types are accepted: png, pdf, 
rda/Rdata. Total message size cannot exceed 1MB".

Martin


>
> If you compare one group (all C) vs only those members of the other
> group that are most different to it (1R+3R), naturally you will find
> lots of DE genes.
>
> Best wishes
> Gordon
>
>> Date: Thu, 12 Jul 2012 10:48:01 +0200
>> From: "alessandro.guffanti at genomnia.com"
>>     <alessandro.guffanti at genomnia.com>
>> To: Bioconductor mailing list <bioconductor at r-project.org>
>> Subject: Re: [BioC] edgeR and tagwise dispersion: overcorrection for
>>     multiple tests?
>>
>> Dear colleagues good morning - I am back to an old issue because I am
>> now much more
>> certain of what I see - and I begin to wonder wether this is due to
>> biology rather than
>> to analytical tools or strategies ..
>>
>> => Here is my sessionInfo() to  begin with:
>>
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-pc-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices datasets  utils     methods base
>>
>> other attached packages:
>> [1] edgeR_2.6.7       limma_3.12.1      R.utils_1.12.1 R.oo_1.9.8
>> [5] R.methodsS3_1.4.2
>>
>> => the experiment description: RNA from five samples and five controls,
>> mice,
>> homogenesous stimulus, brain tissue, SAGE with SOLiD with a good mapping
>> in the UTR (checked also with genome-wide mapping). Tags have been
>> selected
>> with the following parameters: only in UTR; unique mapping; only one
>> mismatch;
>> begin with CATG, hence quite stringent. Hence tha samples are tagged {1
>> to 5}R
>> for ths stimulus, {1 to 5} as the control
>>
>> => MDS plot and simple pairwise regression analysis of the tag counts
>> between
>> R,C,R vs R and C vs C reveals a clear division of the R samples in two
>> groups:
>> {1R, 3R} and {2R,4R,5R}. In addition, one C sample (3C) overlaps with
>> two R samples
>> and is removed from comparisons
>>
>> => three DEG calculations were performed:
>> (A) all C vs all R;
>> (B) all C minus 3 C vs 1R + 3R;
>> (C) all C minus 3 C versus {2R,4R,5R}
>>
>> => tagwise dispersion; normalizatuion factor on the libraries
>> calculated;  filtering by minimal CPM in samples leaves between 6000 and
>> 7000 genes for each comparison.
>>
>> => results which make me wonder about what is happening in the R
>> (esperiment) samples:
>>
>> Comparison A (ALL vs ALL): TWO genes with significant FDR (BH corrected
>> PValue I understand)
>> Comparison B (ALL-3C vs 1R,3R): 2099 genes with significant FDR (!)
>> Comparison C (ALL-3C vs 2R,4R,5R): 20 genes with significant FDR
>>
>> Now, excuse my ignorance, but this is a rather strong effect of the
>> subsetting of one of the two comparison datasets on the FDR, which I
>> did not found in many other similar analyses. In fact, when I first
>> mailed the list, I was talking about 'overcorrection for multiple tests'.
>>
>> Is there any reasonable explanation (apart from {1R,3R} and {2R,4R,5R}
>> being totally different samples, which I exclude) for this ? maybe a
>> strong dependency between the genes involved in the response to the
>> stimulus in the two R subgroups ?
>>
>> I include below the three MDS plots - thanks for any answer and again
>> excuse me, maybe there is a trivial reason for this (such as number of
>> samples..) but it is an unqiue situation between my many SAGE
>> experiments analyzed with edgeR..
>>
>> Kind regards,
>>
>> Alessandro
>>
>> --
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Alessandro Guffanti - Head, Bioinformatics, Genomnia srl
>>  Via Nerviano, 31 - 20020 Lainate, Milano, Italy
>>     Ph: +39-0293305.702 Fax: +39-0293305.777
>>             http://www.genomnia.com
>> "When you're curious, you find lots of interesting things to do."
>> (Walt Disney)
>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:19}}



More information about the Bioconductor mailing list