[BioC] edgeR package : question about 'exactTest' results

Rory Stark Rory.Stark at cancer.org.uk
Thu Jan 26 12:57:47 CET 2012


Hi helen-

Addressing first the use of pre-defined regions instead of peak callers,
this is possible with DiffBind. The easiest way to to write out the
regions in a single file (chr, start, end, 1) and specify the same file
for all the samples. Alternatively, if you are creating the region file
dynamically, you can create a single "dummy" peak file and specify it for
each sample, e.g.

chr1   1   5   1
chr1   10  15  1

Then, in the call to dba.counts, specify your dynamically computed set of
regions as the "peaks" parameter, and DiffBind will use those regions. We
do this all the time, e.g. get all the transcription start sites, then
take windows 1000bp upstream and 500bp downstream of each TSS, and pass
that in to dba.count -- no peak calling needed.

The alternative normalization issue is more difficult. I assume you are
quantile normalizing the read counts for all your regions prior to sending
them to edgeR (by creating a DGEList). Do you do the TMM normalization
after that (calcNormFactors) or somehow fake out the normalization
factors? Or use another normalization method? edgeR includes an
"upperquartile" normalization option, is that not sufficient?  I'd really
like to understand what you do, as we are working on the next version of
DiffBind and providing more flexibility in normalization is an area we are
looking at.

Finally, replicates. While DiffBind does let you run with no replicates
(it just gives you a nasty warning), you really, really, really need
replicates for this kind of analysis. Really. edgeR needs replicates to
compute dispersion for the negative binomial -- without replicates it just
sets the dispersion to zero. Even assuming that technical (sequencing)
effects are negligible, the biological variation and experimental variance
in ChIP efficiency is such that you can have no confidence that any
differential binding you find is meaningful without replicates. ChIP,
thanks in part to its use of antibodies, has greater experimental
variation than RNA expression assays, and a differential expression
analysis without replicates would be laughed out of any peer-reviewed
journal.

We need to get away from it being acceptable to consider a comparative
ChIP-seq analysis without replicates -- even "just to see if there's
anything there". We need to encourage the lab scientists to price this in;
here at CRI we basically refuse to do anything but the most simple
analysis on non-replicated ChIPs.

Cheers-
Rory

On 26/01/2012 10:44, "NEIL-BERNET Helen" <helen.neil-bernet at cea.fr> wrote:

>Rory,
>
>Thanks for your answer.
>I've already tried DiffBind, it gave me roughly the same results as edgeR,
>hopefully!
>Indeed, it's easier for ChIP-seq, but I couldn't address 2 things:
>
>1) I need to use quantile normalization (with limma package) prior to
>differential analysis, I'm not sure that's possible to use directly my own
>normalized counts. Moreover, I don't use peak calling, because I study
>specific regions, for example promoter proximal regions, so I don't have
>any affinity score.
>
>2) I couldn't find a way to analyze my data without replicates, and that's
>a critical point for ChIP-seq, most of the times, we don't have any
>replicates!
>If you can answer these questions, I would really appreciate to work with
>DiffBind instead of edgeR!!
>
>Thank you very much,
>
>helen
>
>
>
>Le 25/01/12 16:33, << Rory Stark >> <Rory.Stark at cancer.org.uk> a écrit :
>
>>Hi helen-
>>
>>After calling exactTest, you can call topTags with the result. topTags
>>takes a parameter, n, which is the number of results to return -- if you
>>set it to the total number of sites you are comparing, you will get them
>>all back with their associated stats. I end up doing something like:
>>
>>> results = topTags(de$db,nrow(db$counts))$table
>>
>>
>>to get everything back.
>>
>>If you are using edgeR to do differential binding analysis of ChIP-seq
>>data, you may also want to check out the DiffBind package, which add
>>functions to make this sort of thing easier.
>>
>>Cheers-
>>Rory
>>
>>On 25/01/2012 15:14, "NEIL-BERNET Helen" <helen.neil-bernet at cea.fr>
>>wrote:
>>
>>>Dear all,
>>>
>>>I use edgeR for differential analysis of ChIP-seq densities.
>>>I would like to know how to export the results from 'exactTest',
>>>including p-values and FDR.
>>>Using 'topTags', I only have the 10 first results, with p-values and
>>>FDR.
>>>But when I do 'write.table(mytest$table, Š)', it gives me only the
>>>complete set of p-values, but not the FDR.
>>>
>>>Thanks for your help,
>>>
>>>Best regards,
>>>
>>>helen
>>>
>>>
>>>
>>>	[[alternative HTML version deleted]]
>>>
>>
>>
>>NOTICE AND DISCLAIMER
>>This e-mail (including any attachments) is intended for the above-named
>>person(s). If you are not the intended recipient, notify the sender
>>immediately, delete this email from your system and do not disclose or
>>use for any purpose.
>>
>>We may monitor all incoming and outgoing emails in line with current
>>legislation. We have taken steps to ensure that this email and
>>attachments are free from any virus, but it remains your responsibility
>>to ensure that viruses do not adversely affect you.
>>Cancer Research UK
>>Registered in England and Wales
>>Company Registered Number: 4325234.
>>Registered Charity Number: 1089464 and Scotland SC041666
>>Registered Office Address: Angel Building, 407 St John Street, London
>>EC1V 4AD.
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor


NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for ...{{dropped:16}}



More information about the Bioconductor mailing list