[BioC] ChIPpeakAnno

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Thu Jun 20 14:37:50 CEST 2013


Ann,

Thanks for the feedback!

Your function call is correct. However, there is a difference between maxgap
and distancetoFeature (or shortestDistance). Maxgap specifies the maximum
gap between two ranges instead of the distance between the ends. For
example, when two ranges overlap, then the gap between the two ranges is 0
(no gap) although the distancetoFeature might be greater than 0 which is
calculated as start of peak - the start of the feature.

Here is a toy example
peak: chr1:1000-1600
feature: chr1:300-2000
distance2Feature = 1000 - 300 = 700
shortestDistance = min(abs(1000-300), abs(1000-2000), abs(1600-300),
abs(1600-2000)) = 400 where abs = absolute value
Gap  = 0 because these two ranges overlap

Please let me know if this makes sense.

Please CC bioconductor in the subsequent communications for others to
input/benefit. Thanks!

Best regards,

Julie


On 6/20/13 3:00 AM, "Ann Mongan" <amongan at quanticel.com> wrote:

> Dear Julie,
> Thank you for developing ChIPpeakAnno, I find it very useful.
> Anyway, I¹m using ChIPpeakAnno_2.2.0.  I found some peculiarity with how my
> peaks are assign to features that are outside of maxgap (example below).
> Could you help me understand why I get these results?  I suppose some
> arguments must not be set correctly.
> Thanks for your help.
> Ann
>  
> t1 = findOverlappingPeaks(ASR, refseqRanges, maxgap=5000, multiple=TRUE,
> select='all',NameOfPeaks1='KDM5B',NameOfPeaks2='RefSeq')
>  
>> head(t1$OverlappingPeaks[t1$OverlappingPeaks$shortestDistance >5000,])
>     KDM5B chr RefSeq RefSeq_start RefSeq_end strand KDM5B_start KDM5B_end
> strand1 overlapFeature shortestDistance
> 62  00033   1  02323       860260     879955      +      870589    871263
> +         inside             8692
> 63  00034   1  02323       860260     879955      +      871383    871883
> +         inside             8072
> 64  00035   1  02323       860260     879955      +      873522    874033
> +         inside             5922
> 120 00062   1  02363       955503     991496      +      964918    966100
> +         inside             9415
> 121 00063   1  02363       955503     991496      +      975841    976296
> +         inside            15200
> 138 00081   1  02398      1109264    1133315      +     1120693   1121410
> +         inside            11429
>  
>  
>  
> p = annotatePeakInBatch(head(ASR,100), AnnotationData=refseqRanges,
> output="both", maxgap=5000,
>        PeakLocForDistance="middle", FeatureLocForDistance="TSS",select="all")
>  
>> head(as.data.frame(p)[p$distancetoFeature>5000,])
>    space  start    end width                    names peak strand
> feature start_position end_position insideFeature distancetoFeature
> shortestDistance
> 7   chr1 870589 871263   675 33 1244.NM_152486.SAMD11   33      +
> 1244.NM_152486.SAMD11         861120       879961        inside
> 9806             8698
> 8   chr1 871383 871883   501 34 1244.NM_152486.SAMD11   34      +
> 1244.NM_152486.SAMD11         861120       879961        inside
> 10513             8078
> 9   chr1 873522 874033   512 35 1244.NM_152486.SAMD11   35      +
> 1244.NM_152486.SAMD11         861120       879961        inside
> 12658             5928
> 10  chr1 874123 875130  1008 36 1244.NM_152486.SAMD11   36      +
> 1244.NM_152486.SAMD11         861120       879961        inside
> 13506             4831
> 11  chr1 875328 875693   366 37 1244.NM_152486.SAMD11   37      +
> 1244.NM_152486.SAMD11         861120       879961        inside
> 14390             4268
> 12  chr1 875720 879253  3534 38 1244.NM_152486.SAMD11   38      +
> 1244.NM_152486.SAMD11         861120       879961        inside
> 16366              708
>    fromOverlappingOrNearest
> 7              NearestStart
> 8              NearestStart
> 9              NearestStart
> 10             NearestStart
> 11             NearestStart
> 12             NearestStart
>  
>  
> 



More information about the Bioconductor mailing list