[BioC] edgeR and tagwise dispersion: overcorrection for multiple tests?

alessandro.guffanti at genomnia.com alessandro.guffanti at genomnia.com
Thu Jul 12 10:48:01 CEST 2012


Dear colleagues good morning - I am back to an old issue because I am 
now much more
certain of what I see - and I begin to wonder wether this is due to 
biology rather than
to analytical tools or strategies ..

=> Here is my sessionInfo() to  begin with:

R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods base

other attached packages:
[1] edgeR_2.6.7       limma_3.12.1      R.utils_1.12.1 R.oo_1.9.8
[5] R.methodsS3_1.4.2

=> the experiment description: RNA from five samples and five controls, 
mice,
homogenesous stimulus, brain tissue, SAGE with SOLiD with a good mapping
in the UTR (checked also with genome-wide mapping). Tags have been selected
with the following parameters: only in UTR; unique mapping; only one 
mismatch;
begin with CATG, hence quite stringent. Hence tha samples are tagged {1 
to 5}R
for ths stimulus, {1 to 5} as the control

=> MDS plot and simple pairwise regression analysis of the tag counts 
between
R,C,R vs R and C vs C reveals a clear division of the R samples in two 
groups:
{1R, 3R} and {2R,4R,5R}. In addition, one C sample (3C) overlaps with 
two R samples
and is removed from comparisons

=> three DEG calculations were performed:
(A) all C vs all R;
(B) all C minus 3 C vs 1R + 3R;
(C) all C minus 3 C versus {2R,4R,5R}

=> tagwise dispersion; normalizatuion factor on the libraries 
calculated;  filtering by minimal CPM in samples leaves between 6000 and 
7000 genes for each comparison.

=> results which make me wonder about what is happening in the R 
(esperiment) samples:

Comparison A (ALL vs ALL): TWO genes with significant FDR (BH corrected 
PValue I understand)
Comparison B (ALL-3C vs 1R,3R): 2099 genes with significant FDR (!)
Comparison C (ALL-3C vs 2R,4R,5R): 20 genes with significant FDR

Now, excuse my ignorance, but this is a rather strong effect of the 
subsetting of one of the two comparison
datasets on the FDR, which I did not found in many other similar 
analyses. In fact, when I first mailed the list,
I was talking about 'overcorrection for multiple tests'.

Is there any reasonable explanation (apart from {1R,3R} and {2R,4R,5R} 
being totally different samples, which I exclude) for this ? maybe a 
strong dependency between the genes involved in the response to the 
stimulus in the
two R subgroups ?

I include below the three MDS plots - thanks for any answer and again 
excuse me, maybe there is a trivial
reason for this (such as number of samples..) but it is an unqiue 
situation between my many SAGE experiments
analyzed with edgeR..

Kind regards,

Alessandro

--







-- 

Alessandro Guffanti - Head, Bioinformatics, Genomnia srl
  Via Nerviano, 31 - 20020 Lainate, Milano, Italy
     Ph: +39-0293305.702 Fax: +39-0293305.777
             http://www.genomnia.com
"When you're curious, you find lots of interesting things to do."
(Walt Disney)



-----------------------------------------------------------
Il Contenuto del presente messaggio potrebbe contenere informazioni confidenziali a favore dei
soli destinatari del messaggio stesso. Qualora riceviate per errore questo messaggio siete pregati 
di cancellarlo dalla memoria del computer e di contattare i numeri sopra indicati. Ogni utilizzo o 
ritrasmissione dei contenuti del messaggio da parte di soggetti diversi dai destinatari è da 
considerarsi vietato ed abusivo.

The information transmitted is intended only for the person or entity to which it is addressed and 
contains confidential and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon, this information by persons or entities other than 
the intended recipient is prohibited. If you received this in error, please contact the sender and 
delete the material from any computer.
-----------------------------------------------------------


More information about the Bioconductor mailing list