[BioC] Determining an overlapping annotation data subset (overlap/overlaps)

Herve Pages hpages at fhcrc.org
Tue Aug 7 02:51:45 CEST 2007


Hi Stephen,

> A <- data.frame(start=(1:5)*10L, end=(4:8)*10L)
> A
  start end
1    10  40
2    20  50
3    30  60
4    40  70
5    50  80

> B <- data.frame(start=c(31L, 39L, 80L), end=c(60L, 40L, 84L))
> B
  start end
1    31  60
2    39  40
3    80  84

You can create a logical vector of the length the number of rows in A: for each
A-row it says if there is any B-row inside:

  contains_a_Brow <- mapply(function(Astart, Aend) any(Astart <= B$start & B$end <= Aend),
                            A$start, A$end)

Then use this logical vector to subset A:

  A[contains_a_Brow, ]

Cheers,
H.

Stephen Montgomery wrote:
> Hello Bioconductor -
> 
> Apologies as this a fairly rookie bioinformatics based R question, but I
> am trying to determine if there is a R one-liner to extract a subset of
> a data frame which possesses annotation contained within it that has
> been stored in another data frame?  (For example extracting genomic
> intervals which contain certain features/annotation)
> 
> Such that:
> If I have dataframe "A" possessing an "id", "start", and "end"; And
> dataframe "B" also possessing an "id", "start", and "end"; The output is
> all the rows of A which contain an entry of B (B$start, B$end) within
> A$start and A$end.
> 
> I have tried my own fairly uninformed variants like this to no-avail
> A[length(B[B$start <= A$end & B$end >= A$start]) > 0,]
> I fear the solution will be trivial but as yet it has eluded me. :/
> 
> Thanks for any help!  (Theoretically, I can also see doing this in its
> own function by creating a vector of counts for each member of "A" and
> then reporting those that are non-zero but I was wondering if there was
> a more succinct and likely efficient way)
> 
> Thanks again,
> Stephen
> 
> 
> 
> Stephen Montgomery, B.A.Sc., Ph.D.
> Postdoctoral Researcher, Team 16
> Wellcome Trust Sanger Institute
> Hinxton, Cambridge CB10 1SA
> Phone: 44-1223-834244 (ext 7297)
> Skype: stephen.b.montgomery
>  
> 
> 
>



More information about the Bioconductor mailing list