[R] Merge by Range in R

Mohammad Tanvir Ahamed mashranga at yahoo.com
Mon Sep 4 14:31:17 CEST 2017


Hi, 
I have two big data set. 

data _1 : 
> dim(data_1)
[1] 15820 5

> head(data_1)
   Chromosome      Start        End        Feature GroupA_3
1:               chr1 521369  750000     chr1-0001        0.170
2:               chr1 750001  800000     chr1-0002       -0.086
3:               chr1 800001  850000     chr1-0003        0.006
4:               chr1 850001  900000     chr1-0004        0.050
5:               chr1 900001  950000     chr1-0005        0.062
6:               chr1 950001 1000000    chr1-0006       -0.016

data_2:
> dim(data_2)
[1] 470870 5

> head(data_2)
   Chromosome     Start   End            Feature     GroupA_3
1:               chr1 15864 15865     cg13869341            0.207
2:               chr1 18826 18827     cg14008030           -0.288
3:               chr1 29406 29407     cg12045430           -0.331
4:               chr1 29424 29425     cg20826792           -0.074
5:               chr1 29434 29435     cg00381604            0.141
6:               chr1 68848 68849     cg20253340           -0.458


What I want to do : 
Based on column name "Chromosome", "Start" and "End" of two data set ,   I want to find which row (preciously "Feature") of data_2 is in every range ( between "Start" and "End") of data_1 ? Also "Chromosome" column element should be match between two data set. 

I have tried "GenomicRanges" packages describe in the post  
https://stackoverflow.com/questions/11892241/merge-by-range-in-r-applying-loops
But i was not successful. Can any one please help me to do this fast, as the data is very big ? 
Thanks in advance.


Regards.............
Tanvir Ahamed Stockholm, Sweden     |  mashranga at yahoo.com

	[[alternative HTML version deleted]]



More information about the R-help mailing list