[BioC] New to Bioconductor is there a better way?

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Thu Mar 15 15:05:42 CET 2012


This is the way to do it.

There is a convenience function called subsetByOverlaps(), you can
probably guess what it does.

Kasper

On Thu, Mar 15, 2012 at 10:01 AM, Davis, Brian <Brian.Davis at uth.tmc.edu> wrote:
> I'm very new to Bioconductor (first time to use it) but not to R.  I have a solution to my problem but being new to Bioconductor I'm wondering if there isn't a more appropriate/better way to solve my problem.
>
>
> I have data frame of chromosome/position pairs (along with other data for the location).  For each pair I need to determine if it is with in a given data frame of ranges.  I need to keep only the pairs that are within any of the ranges for further processing.
>
>
>
> Example:
>
> snps<-NULL
>
> snps$CHR<-c("1","2","2","3","X")
>
> snps$POS<-as.integer(c(295,640,670,100,1100))
>
> snps$DAT<-seq(1:length(snps$CHR))
>
> snps<-as.data.frame(snps, stringsAsFactors=FALSE)
>
>
>
> snps
>
>  CHR  POS DAT
>
> 1   1  295   1
>
> 2   2  640   2
>
> 3   2  670   3
>
> 4   3  100   4
>
> 5   X 1100   5
>
>
>
> region<-NULL
>
> region$CHR<-c("1","1","2","2","2","X")
>
> region$START<-as.integer(c(10,210,430,650,810,1090))
>
> region$STOP<-as.integer(c(100,350,630,675,850,1111))
>
> region<-as.data.frame(region, stringsAsFactors=FALSE)
>
>
>
> region
>
>  CHR START STOP
>
> 1   1    10  100
>
> 2   1   210  350
>
> 3   2   430  630
>
> 4   2   650  675
>
> 5   2   810  850
>
> 6   X  1090 1111
>
>
>
>
>
> The result I need would look like
>
>
>
> Res
>
>
>
> CHR  POS DAT
>
>   1  295   1
>
>   2  670   3
>
>   X 1100   5
>
>
>
>
>
> My current data set is ~100K snp entries, and my regions table has ~200K entries. I have ~1500 files to go through.
>
>
>
> My current solution is:
>
> library(GenomicRanges)
> snplist<-with(snps, GRanges(CHR, IRanges(POS, POS)))
> locations<-with(region, GRanges(CHR, IRanges(START, STOP)))
> olaps<-findOverlaps(snplist, locations)
>
> then I can easily use olaps to subset as needed.  Just trying to see if there are other functions / ways to go about solving this in an effort to learn.
>
> Thanks,
>
> Brian Davis
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list