[BioC] ChIPpeakAnno

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Fri Jun 21 01:09:38 CEST 2013


Ann,

Please see my response below. Thanks!

Best regards,

Julie


On 6/20/13 10:47 AM, "Ann Mongan" <amongan at quanticel.com> wrote:

> Hi Julie,
> Thank you very much for your prompt response. I was sort of guessing that was
> the case.  However
> 1) for my second call, I specified "PeakLocForDistance="middle",
> FeatureLocForDistance="TSS", I would have thought that only peaks with 5 Kb
> around the start position of the feature would be return? Since it's not, how
> does my 2 calls differ?


Your fist call is to find overlapping features that are not gapped by more
than maxgap.

Your second call is to find both nearest feature and overlapping features.
In case, there is no overlapping features, you will still obtain nearest
features with the distance calculated from PeakLocForDistance -
FeatureLocForDistance

Therefore, the features from your first call is a subset of the features of
your second call.



> 2) since I'm only interested in peaks within 5 Kb from TSS, I suppose I could
> just filter out my previous result instead of running it again, right?  For
> future runs, should I just create a list features that is only 1 bp at the
> TSS?  I'm using refseq start site as TSS, would that be your recommendation,
> too?

Yes, you could just filter the results from your second call.

I would recommend use the whole feature ranges, and filter later. Creating a
list of features that is only 1 bp at the TSS might still requires you to
filter for very wide peaks. Imaging the TSS lands inside the peak. For
narrow peaks, please feel free to use this trick.


> 3) by default, since the overlap calculation is bidirectional, wouldn't this
> also cover cases for bidirectional promoters?

There is actually a function for this purpose. Please type ?peaksNearBDP
after loading ChIPpeakAnno

> 
> Lastly, I don't know the email for the bioconductor list, is there a specific
> list for this package?

The bioconductor list is bioconductor at r-project.org (
http://www.bioconductor.org/help/mailing-list/). FYI, I will be away for two
weeks starting tomorrow, so please email Jianhong Ou and cc Bioconductor for
subsequent communications. Thanks!

> 
> Have a great day!

You too!

> Ann
> 
> 
> 
> 
> 
> On Thursday, June 20, 2013, Zhu, Lihua (Julie)  wrote:
>> Ann,
>> 
>> Thanks for the feedback!
>> 
>> Your function call is correct. However, there is a difference between maxgap
>> and distancetoFeature (or shortestDistance). Maxgap specifies the maximum
>> gap between two ranges instead of the distance between the ends. For
>> example, when two ranges overlap, then the gap between the two ranges is 0
>> (no gap) although the distancetoFeature might be greater than 0 which is
>> calculated as start of peak - the start of the feature.
>> 
>> Here is a toy example
>> peak: chr1:1000-1600
>> feature: chr1:300-2000
>> distance2Feature = 1000 - 300 = 700
>> shortestDistance = min(abs(1000-300), abs(1000-2000), abs(1600-300),
>> abs(1600-2000)) = 400 where abs = absolute value
>> Gap  = 0 because these two ranges overlap
>> 
>> Please let me know if this makes sense.
>> 
>> Please CC bioconductor in the subsequent communications for others to
>> input/benefit. Thanks!
>> 
>> Best regards,
>> 
>> Julie
>> 
>> 
>> On 6/20/13 3:00 AM, "Ann Mongan" <amongan at quanticel.com> wrote:
>> 
>>> Dear Julie,
>>> Thank you for developing ChIPpeakAnno, I find it very useful.
>>> Anyway, I¹m using ChIPpeakAnno_2.2.0.  I found some peculiarity with how my
>>> peaks are assign to features that are outside of maxgap (example below).
>>> Could you help me understand why I get these results?  I suppose some
>>> arguments must not be set correctly.
>>> Thanks for your help.
>>> Ann
>>> 
>>> t1 = findOverlappingPeaks(ASR, refseqRanges, maxgap=5000, multiple=TRUE,
>>> select='all',NameOfPeaks1='KDM5B',NameOfPeaks2='RefSeq')
>>> 
>>>> head(t1$OverlappingPeaks[t1$OverlappingPeaks$shortestDistance >5000,])
>>>     KDM5B chr RefSeq RefSeq_start RefSeq_end strand KDM5B_start KDM5B_end
>>> strand1 overlapFeature shortestDistance
>>> 62  00033   1  02323       860260     879955      +      870589    871263
>>> +         inside             8692
>>> 63  00034   1  02323       860260     879955      +      871383    871883
>>> +         inside             8072
>>> 64  00035   1  02323       860260     879955      +      873522    874033
>>> +         inside             5922
>>> 120 00062   1  02363       955503     991496      +      964918    966100
>>> +         inside             9415
>>> 121 00063   1  02363       955503     991496      +      975841    976296
>>> +         inside            15200
>>> 138 00081   1  02398      1109264    1133315      +     1120693   1121410
>>> +         inside            11429
>>> 
>>> 
>>> 
>>> p = annotatePeakInBatch(head(ASR,100), AnnotationData=refseqRanges,
>>> output="both", maxgap=5000,
>>>        PeakLocForDistance="middle",
>>> FeatureLocForDistance="TSS",select="all")
>>> 
>>>> head(as.data.frame(p)[p$distancetoFeature>5000,])
>>>    space  start    end width                    names peak strand
>>> feature start_position end_position insideFeature distancetoFeature
>>> shortestDistance
>>> 7   chr1 870589 871263   675 33 1244.NM_152486.SAMD11   33      +
>>> 1244.NM_152486.SAMD11         861120       879961        inside
>>> 9806             8698
>>> 8   chr1 871383 871883   501 34 1244.NM_152486.SAMD11   34      +
>>> 1244.NM_152486.SAMD11         861120       879961        inside
>>> 10513             8078
>>> 9   chr1 873522 874033   512 35 1244.NM_152486.SAMD11   35      +
>>> 1244.NM_152486.SAMD11         861120       879961        inside
>>> 12658             5928
>>> 10  chr1 874123 875130  1008 36 1244.NM_152486.SAMD11   36      +
>>> 1244.NM_152486.SAMD11         861120       879961        inside
>>> 13506             4831
>>> 11  chr1 875328 875693   366 37 1244.NM_152486.SAMD11   37      +
>>> 1244.NM_152486.SAMD11         861120       879961        inside
>>> 14390             4268
>>> 12  chr1 875720 879253  3534 38 1244.NM_152486.SAMD11   38      +
>>> 1244.NM_152486.SAMD11         861120       879961        inside
>>> 16366              708
>>>    fromOverlappingOrNearest
> 



More information about the Bioconductor mailing list