[BioC] Determining an overlapping annotation data subset (overlap/overlaps)

Stephen Montgomery sm8 at sanger.ac.uk
Mon Aug 6 14:52:27 CEST 2007


Hello Bioconductor -

Apologies as this a fairly rookie bioinformatics based R question, but I
am trying to determine if there is a R one-liner to extract a subset of
a data frame which possesses annotation contained within it that has
been stored in another data frame?  (For example extracting genomic
intervals which contain certain features/annotation)

Such that:
If I have dataframe "A" possessing an "id", "start", and "end"; And
dataframe "B" also possessing an "id", "start", and "end"; The output is
all the rows of A which contain an entry of B (B$start, B$end) within
A$start and A$end.

I have tried my own fairly uninformed variants like this to no-avail
A[length(B[B$start <= A$end & B$end >= A$start]) > 0,]
I fear the solution will be trivial but as yet it has eluded me. :/

Thanks for any help!  (Theoretically, I can also see doing this in its
own function by creating a vector of counts for each member of "A" and
then reporting those that are non-zero but I was wondering if there was
a more succinct and likely efficient way)

Thanks again,
Stephen



Stephen Montgomery, B.A.Sc., Ph.D.
Postdoctoral Researcher, Team 16
Wellcome Trust Sanger Institute
Hinxton, Cambridge CB10 1SA
Phone: 44-1223-834244 (ext 7297)
Skype: stephen.b.montgomery
 



-- 
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE.



More information about the Bioconductor mailing list