[BioC] Distance to Feature

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Tue Mar 1 20:23:49 CET 2011


Sarah,

The function accounts for the strand information in the annotation record
when FeatureLocForDistance is set as TSS. For the other settings, the strand
information is not used.

You are interested in annotating the peaks with geneEnd which is different
from the end in FeatureLocForDistance. For your data set, I would recommend
you prepare two annotation files, one contains genes on the + strand
(plusAnn) and the other contains genes on the - strand (minusAnn). Then set
FeatureLocForDistance = "end" with the plusAnn and FeatureLocForDistance =
"start" with the minusAnn.

Best regards,

Julie


On 3/1/11 2:10 PM, "Sarah Sheppard" <SarahSheppard at gmail.com> wrote:

> It would seem that this function does not account for strand then.  For
> example, for this peak the  "distance to feature" is given as -7364
> 
> The output line is this:
> 
> "38163" "20" 39614606 39614683 78 "56822
> ENSDART00000023531" "56822" "-" "ENSDART00000023531" 39614524 39622047 "inside
> " -7364 82 "Overlapping"
> 
> The ensembl annotation in my bed file is this:
> 
> 20 39614524 39622047 ENSDART00000023531 1 -1
> 
> Note these are both on the minus strand, so the end coordinate would be the
> left most coordinate, so end of the peak to the end of the feature would be
> 39614606 -39614524 = 82
> Instead, AnnotatePeakInBatch calculates this as the right most coordinates
> e.g.   rightmost peak coordinate 39614683- rightmost transcript coordinate
> (really the start here)39622047= -7364
> 
> Am I doing something wrong? or do I need to change the source code somehow?
> 
> Thanks,
> Sarah
> On Mar 1, 2011, at 1:52 PM, Zhu, Lihua (Julie) wrote:
> 
>> Sarah,
>> 
>> Yes, your assumption is correct.
>> 
>> Best regards,
>> 
>> Julie
>> 
>> 
>> On 3/1/11 1:50 PM, "Sarah Sheppard" <SarahSheppard at gmail.com> wrote:
>> 
>>> Hi Julie,
>>> 
>>> 
>>> 
>>> I thought when specifying "FeatureLocForDistance" as "end" and
>>> "PeakLocForDistance" as "end", that I would get the distance from the end of
>>> the peak (Rmost coord on the + strand, Lmost coord. on the - strand) to the
>>> end of the feature (e.g. transcript, since these are the coordinates I used
>>> in
>>> the annotation data) in "distance to feature". In the "distance to feature"
>>> output description, it says "distance to nearest feature, such as tss".
>>> 
>>> Am I incorrect in my assumption of what value is expressed in "distance to
>>> feature"?
>>> 
>>> Thanks,
>>> Sarah
>> 
>> 
> 



More information about the Bioconductor mailing list