[BioC] ChIPpeakAnno-makeVennDiagram question

Wed Aug 21 19:55:32 CEST 2013

Dear Camila,

For the ChIPpeakAnno paper, I have the peaks with all p-values.

If you do not have this kind of information, then the most conservative
estimate of totalTest would be (peak1 + peak2) - peaks.1and2. The smaller
the totalTest you set, the higher the p-value is. So the p-value you obtain
will be the most conservative estimate. If this approach is too conservative
for your purpose, then you could set the totalTest to be the number of
potential binding sites in the genome by searching for the motif (could be
obtained from the literature or your peak sets). Certainly if accessibility
data is available, then you could limit your search within those accessible
regions.

Hope this helps.

Best regards,

Julie 

On 8/21/13 12:39 PM, "Camila Lopez-anido" <lopezanido at wisc.edu> wrote:

> Thanks! 
> 
> I'm still considering different systematic approaches to determine the
> "totalTest" value, which I'm considering as the total possible transcription
> factor (TF) binding events in the rat genome. I saw that you previously said
> that one could add up the total number of peaks with p-value=1 or
> FDR=1 (http://permalink.gmane.org/gmane.science.biology.informatics.conductor/
> 30115), but I don't have access to the raw data.
> 
> 
> One idea I had was to *assume* that the total number of peaks with p-value=1
> is 1.5X greater than those called significant.
> SO, if peak1 set = 7602 peaks, and peak2 set = 25335 peaks, then could I use
> totalTest = (7602 + 25335 = 32937) * 1.5 = 49405.5
> 
> 
> If I use totalTest = (peak1 + peak2) * 1.5, then I could apply this systematic
> approach to make different comparisons between various datasets. This seems
> like a better idea than arbitrarily picking a value greater than the largest
> peak data set. Does this make sense, or would you recommend a different
> approach/assumptions depending on the size of the genome (or area of open
> chromatin accessible for TF binding)?
> 
> 
> Also, I was trying to figure out how the value 1580 was chosen for totalTest
> in the Zhu (2010) ChIPpeakAnno manuscript? Was a systematic approach used
> similar to one I tried to come up with above?
> 
> 
> Best, 
> Camila 
> 
> 
> 
> On 08/15/13, "Zhu, Lihua (Julie)"
>  wrote:
>> Camila,
>> 
>> TotalTest needs to be at least as large as the number of peaks in
>> CNSallPeaks_rd and OPColig2Peaks_rd. For example, if there are 1000 peaks in
>> CNSallPeaks_rd and 2000 peaks in OPColig2Peaks_rd, then totalTest needs to
>> be >= 2000.
>> 
>> For details, please refer to an old post at
>> http://grokbase.com/t/r/bioconductor/123cz0jc0b/bioc-question-about-makevenn
>> diagram. Thanks!
>> 
>> Best regards,
>> 
>> Julie
>> 
>> 
>> On 8/15/13 6:08 PM, "Camila Lopez-anido" <lopezanido at wisc.edu> wrote:
>> 
>>> Hi, 
>>> 
>>> I'm a graduate student at the UW-Madison, and I'm getting an error message
>>> when I use makeVennDiagram() -I was wondering if there is a simple solution?
>>> I
>>> saw that someone else had asked about something related (similar message)
>>> online, but I couldn't find how they fixed the problem?
>>> 
>>> 
>>> I inputed two RangedData files, which I successfully performed
>>> findOverlappingPeaks() with:
>>> 
>>> 
>>>> makeVennDiagram(RangedDataList(CNSallPeaks_rd, OPColig2Peaks_rd),
>>>> NameOfPeaks=c("CNSall","OPColig2"), maxgap=0, minoverlap=1, totalTest=100,
>>>> cex=1, counts.col="red", useFeature=FALSE)
>>> 
>>> 
>>> #Error in seq.default(cnt, length.out = counts[i]) :
>>> #length must be non-negative number
>>> #In addition: Warning message:
>>> #In phyper(p1.and.p2 - 1, p2, totalTest - p2, p1, lower.tail = FALSE, :
>>> #NaNs produced
>>> 
>>> 
>>> 
>>> Is there a simple solution to this?
>>> 
>>> 
>>> Thanks, 
>>> Camila Lopez-Anido