[R] merge function in R?

David Winsemius dwinsemius at comcast.net
Sat Aug 14 01:29:58 CEST 2010


Neither you nor your responder have continued the eamil chain very  
well so let me put things back together:
on  Aug 13, 2010; 03:54pm fishkbob wrote subj = merge function in R?

>>> So I have a bunch of c(start,end) points and want to consolidate  
>>> them into as few c(start,end) as possible.
>>>
>>> For example:
>>> sample   start    end
>>> A              5       10
>>> B              7       18
>>> C              1        4
>>> D              16      20
>>>
>>> I'd want the function to return the two distinct sets (1,4) and  
>>> (5,20)
>>>
>>> Is there an R function that already does this?
>>> or should I write my own? (how would I go about that?)

> In an effort to be be helpful but not copying the prior message on  
> Aug 13, 2010; 06:46pm  JesperHybel wrote:

>> I think it would be helpful if you could clarify youre question -  
>> do you want distinct sets - maybe use
>>
>> unique()
>>
>> but why (5,20) when its (5,10) in the row in youre example? What  
>> criteria do you want the function to select the "sets" by and what  
>> kind of output do you need?
>>
>> Maybe it's just me who dosn't get the question..sr

On Aug 13, 2010, at 7:01 PM, fishkbob wrote:

>
> I too think I worded it incorrectly...
>
> so the second two columns of the matrix are the start and end of an  
> interval
> however, because some of the intervals overlap, I want to limit the  
> number
> of intervals I have to deal with.
>
> So therefore,
> (5     10)    should merge with    (7     18)   making    (5     18)
> and then (5    18)   should merge with (16    20)   giving   (5    20)
> whereas  (1     4) has no overlap with any other interval and is  
> therefore
> left on its own
>
> Ideal output would just be a collapsing of the matrix
> sample   start     end
> #              5       20
> #              1        4
>
> I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives  
> me a
> c(1:4,5:20)
> However, I have to do this on a very large dataset and the numbers  
> are more
> like
> c(100542:100782,598322:598821,...)
>
> any help would be appreciated
> thanks
> -- 
> View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html
> Sent from the R help mailing list archive at Nabble.com.

Nabble is where I saw all of this, but Nabble is not r-help:

I suggest you sort your rows by the "start" variable and then examine  
where the breaks would remain by looking at the prior values of "end":

 > dd <- rd.txt("sample   start    end
+ A              5       10
+ B              7       18
+ C              1        4
+ D              16      20")
 > dd[order(dd$start), ]
   sample start end
3      C     1   4
1      A     5  10
2      B     7  18
4      D    16  20
 > ndd <- dd[order(dd$start), ]
 > ndd$inprior <- c(NA, ndd[1:nrow(ndd)-1,3] >= ndd[2:nrow(ndd),2] )
 > ndd
   sample start end inprior
3      C     1   4      NA
1      A     5  10   FALSE
2      B     7  18    TRUE
4      D    16  20    TRUE

-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list