[R] find common regions between two kinds of tests

Tue May 15 22:32:49 CEST 2018

Dear R community,

For 100 sites at human chromosomes, I ran two tests, one is to consider an experiment measurement as a continuous variable, so doing multiple regression; the other  is to compare top 25% samples to bottom 25% samples based on  values of the measured variable, so categorical analysis. A total of 16 sites show significance;  In the following results, I only show five variables ( site, region,  test, chr, start); then I need to add the sixth variable called "common" to label a common region (2 regions in this example file) with p value significance from both tests.

In the second "common" region, chr (chromosome) is the same (chr 1) and start location are also same for all six sites (three from categorical analysis and three from continuous analysis), just end location (not known) different, so I labeled them as one common region;  for the first "common" region, they are in chromosome 1,  chromosome start location is not the same, but location difference is less than 1000 base pairs, so they are in the same chromosome region.

I used  SAS first.location  Idea, then using a R cumsum function I learned from Bert;  So comparing region variable and num.location variable, I can find out the second common region although I have not figured out how to label it using R.  I have no idea about how to find the first "common" region.

Can you help me?

Thank you very much!!

Ding

common <- c(NA,NA,1,1,1,1,1,2,2,2,2,2,2, NA, NA, NA);
site <-seq(1, 16);
region <- c(1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6);
test <-c("categorical","categorical","continuous","continuous","continuous","categorical",
         "categorical","continuous","continuous","continuous","categorical","categorical",
         "categorical","continuous","continuous","continuous");
chr <-c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2);
start <-c(3229921,3229921,16553549,16553549,16553549,16554171,16554171,32826843,32826843,
             32826843,32826843,32826843,32826843,30669385,30669385,30669385);
dat <-data.frame(common,site, region, test, chr, start, stringsAsFactors = F);

dat$first.location <- !duplicated(dat$start);
dat$num.location <-cumsum(!duplicated(dat$start));

---------------------------------------------------------------------
-SECURITY/CONFIDENTIALITY WARNING-
This message (and any attachments) are intended solely f...{{dropped:22}}