[BioC] ask for your help about some questions of using edgeR

Tue Sep 20 02:02:50 CEST 2011

Dear Zhenling,

I'll try to answer your questions as best I can.  But the bottom line is 
that there may simply be only a few differentially expressed miRs between 
your two groups.

> Date: Sun, 18 Sep 2011 14:32:12 -0600
> From: Zhenling Peng <zhenling at ualberta.ca>
> To: bioconductor at r-project.org
> Subject: [BioC] ask for your help about some questions of using edgeR
>
> Dear Sir/Madam,
>
> Would you mind me disturbing you for a few minutes?
>
> I know edgeR can do comparison among multiple libraries. I did have 4 
> libraries of miRNA, where two of them are normal (library1 and 
> library2), the other two are abnormal (library3 and library4). And I'm 
> trying to get the miRNA which expressed differently between the two 
> groups: normal and abnormal. I utilized common dispersion, and the 
> results are displayed at the end. When using edgeR, I have some 
> questions as follows,
>
> 1) according to the results got from *summary(decideTestsDGE(de.com, 
> p.value = 0.05))*, it seems that there is only 1 miRNA: RNA_de is 
> expressed differently if I take 0.05 as the cut-off of the 
> *adjust.p.val* (please see *line 20* and *line 12*).

This of course may simply be correct.  There may be only a few 
differentially expressed miRs in your data.  The common dispersion 
estimate appears to be reasonably small.

There is no reason why you need to restrict to a FDR (adjusted p-value) of 
less than 0.05.  You could easily examine the top three miRs.

There is something wrong with your topTags() output.  This function does 
not print column headings in the format that you have presented.  You must 
be using other R code that you haven't shown.

> While *line 16* indicates that there is a constriction on the counts in 
> libraries. In fact, the counts in library 1 and 2 are not consistently 
> greater than those in library 3 and 4. So I wondered how the package 
> edgeR deals with this kind of inconsistent data. Is it still reliable to 
> determine that the RNA_de is differently expressed between two groups? 
> When the edgeR makes a comparison among multiple libraries, does the 
> edgeR merge the libraries within the same group together before the 
> comparison or not? if it does, how does it merge these libraries 
> together?
>
> I tried to find the answer in the manual and the two papers to be 
> implemented into edgeR, but I failed finally as my limited knowledge on 
> statistics.

No, it doesn't simply merge libraries together.  The whole point of having 
biological replicate libraries is to measure the variation between them. 
This gives an estimate of the underlying biological variation, against 
which significance is assessed.

Think of it like a two-sample t-test, where you assess the separation of 
two groups relative to variation within the two groups.  You do not 
require observations to all be the same, and you don't merge them 
together.  With common dispersion, edgeR is analogous to two-sample 
t-tests where the same standard deviation estimate is used for every miR. 
With tagwise dispersion, edgeR is analogous to two-sample t-tests with 
different standard deviations estimated for each miR.

> So could you please give me a favor to explain those questions for me?2) 
> according to the norm.factor for each library, and the value of the 
> common.dispersion (*line 1-4*,* line10* and *11*), I think it's OK to 
> use common dispersion for our data. What do you think about it? 
> Actually, I'm not sure *when we can use * *common.dispersion*.

I generally recommend tagwise, if you have any reasonable number of 
replicates.  The choice has nothing to do with the norm.factors.

> And I tried tagwise dispersion with prion.n = 10 (20/(4-2)) also, but 
> the results are very similar to the results got from common.dispersion. 
> In addition, I made the pair-wise comparison between two libraries (one 
> is normal, the other is abnormal), but the the value of the 
> *common.dispersion*approximates to 0 (1.0E-16). So I doubt the 
> reliability of the results when using common dispersion to make 
> pair-wise comparison between libraries.

Well, if you try to do a comparison without replicates, it is impossible 
to estimated biological variability, so edgeR puts it to zero.  You are 
correct that this is not reliable, but the problem is with the lack of 
replicates, not with the use of common dispersion.

> In general, if we have multiple libraries and we only want to find the 
> differently expressed miRNA/genes among groups, is their any other 
> comparison except pair-wise comparison between groups? Would you mind 
> giving me some suggestions?

I can't see what is wrong with the pair-wise comparison.  You only have 
two groups, so there is nothing else sensible that could be done.

> 3) From *line 19*, we can see that the miRNA---RNA_ad shows consistent 
> and significant difference between groups, this is consistent with the 
> P-value for this miRNA in *line 15*.  While the ajust.p.val is equal to 
> 1 (*line 15*, this ajust.p.val is based on the method "BH"). So I 
> wondered whether we can use the P-value to analyze differential 
> expression or not. If not, do we have to use ajust.p.val to make 
> analysis? Is it always more reliable to use ajust.p.val instead of 
> P-value to make analysis? Could you please give me some advice?

Well, if you simply use unadusted p-values, you would find about 5% of the 
miRs to be signficantly different even from random data.  You will be 
guaranteed to find statistical significant differences even when there are 
none.  This would not be considered defensible in a scientific journal.

Best wishes
Gordon

> I'm very sorry to disturb you with so many naive questions. But I really
> confused about the unsatisfied results after using edgeR, and I'm not sure
> if I made any mistakes when using edgeR. Could you please give me a favor?
> Would you mind giving me any advice about my questions? Thank you very much!
>
> Best wishes,
> Zhenling Peng
> University of Alberta
>
>> dim(d)
> [1] 658   4
>> d
> An object of class "DGEList"
> $samples
>               files         group      lib.size         norm.factors
> library1  library1    normal    3184257      1.0025762
> .............................................line 1
> library2  library2    normal    2825126      0.9873665
> .............................................line 2
> library3  library3  abnormal  3955120      1.0476795
> .............................................line 3
> library4  library4  abnormal  4692333      0.9642192
> .............................................line 4
> $counts
>                  library1   library2   library3   library4
> RNA_aa    302440   274619   479102   403961
> ..............................................line 5
> RNA_ab    298597   283090   206178   331258
> ..............................................line 6
> RNA_ac    293963   218700   508874   573151
> ..............................................line 7
> RNA_bd    199355   168652   180300   262947
> ..............................................line 8
> RNA_ae    149868   131546   177037   201992
> ..............................................line 9
> 653 more rows ...
>
>> d <- estimateCommonDisp(d)
>> d$common.dispersion
> [1] 0.03584351
>         ...............................................line 10
>> sqrt(d$common.dispersion)
> [1] 0.1893238
>         ...............................................line 11
>
>> de.com <- exactTest(d)
> Comparison of groups:  normal - abnormal
>> topTags(de.com, n=5)
> Comparison of groups: normal-abnormal
>                   logConc        logFC                P.Value
>     adjust.p.val
>
> RNA_de    -15.851627    1.7031386        5.969173e-07    0.0003927716
>         .............line 12
> RNA_ea    -16.861371    1.4352602        2.465201e-04    0.0554942582
>         .............line 13
> RNA_cf     -34.679446    30.6732173      2.833211e-03     0.3225116431
>           .............line 14
> RNA_ad    -3.690956      0.6652757        1.554169e-02     1.0000000000
>          ..............line 15

This is not correct output from topTags().

>> detags.com <- rownames(topTags(de.com, n=5)$table)
>> d$counts[detags.com, ]
>                library1    library2     library3      library4
> RNA_de   153        * 35 *             *62  *           18
>          ...............................................line 16
> RNA_ea   49           34              28             16
>        ...............................................line 17
> RNA_cf    5             4                0                0
>            ...............................................line 18
> RNA_ad   293963   218700      508874      573151
> ...............................................line 19
>
>> summary(decideTestsDGE(de.com, p.value = 0.05))
> ...............................................line 20
>   [,1]
> -1    0
> 0   657
> 1     1

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}