[BioC] help with reduction operation using IRanges/GRanges

Thu Apr 26 03:44:26 CEST 2012

Given my misunderstanding earlier about this task, I should probably
be careful about speaking up.  But clustering is not going to work on
millions of reads.  The distance matrix is going to be too big, I
think.  Depends on the size of the problem.

Kasper

On Wed, Apr 25, 2012 at 7:14 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Making an adjacency matrix from a Hits object would be something like:
>
> am <- matrix(0, queryLength(hits), subjectLength(hits))
> am[as.matrix(hits)] <- 1
>
> Michael
>
> On Wed, Apr 25, 2012 at 3:40 PM, Abhishek Pratap <apratap at lbl.gov> wrote:
>
>> Thanks Steve. Legit solution and it works for me too. Based on my partial
>> understanding of methods I have no idea how this will scale for a million
>> points,I have in the actual data(may be it will) but I will let you know.
>>
>> @Michael : I updated my installation and I am able to run the intersect
>> step on the findOverlaps() output from start and end.
>> I guess now I need to convert the common hits to a graph object and call
>> connComp on it. Any way I could convert hits matrix to a adjacency matrix
>> to create a graph or maybe there is another slick way to find the connected
>> points.
>>
>> ir <- IRanges(c(10,10,11,9,10,11), width=c(190,190,190,190,180,180))
>> start  <- flank(ir,1,both=TRUE)
>> end <- flank(ir,1,start=FALSE,both=TRUE)
>> start_overlaps <- findOverlaps(start)
>> end_overlaps <- findOverlaps(end)
>> good_hits <- intersect(start_overlaps,end_overlaps)
>>
>>
>> Thanks!
>> -Abhi
>>
>>
>> On Wed, Apr 25, 2012 at 3:08 PM, Steve Lianoglou <
>> mailinglist.honeypot at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> On Wed, Apr 25, 2012 at 5:21 PM, Abhishek Pratap <apratap at lbl.gov> wrote:
>>> > Hi Michael
>>> >
>>> > SessionInfo copied below. My versions could be one older to current
>>> one.  I
>>> > am still wondering how I can get this information in a format that can
>>> be
>>> > digested by connectedComp or something similar. I think we are close to
>>> a
>>> > solution.
>>>
>>> Step 1: Upgrade R ;-)
>>>
>>> It's not necessary for the approach I'm going to suggest, but it'll
>>> probably make it easier for Michael to help you w/ his solution, which
>>> is probably going to be more robust than the
>>> duct-tape-and-elmer's-glue snippet I'm going to try:
>>>
>>> R> library(GenomicRanges)
>>> R> ir <- IRanges(c(10,10,11,9,10,11), width=c(190,190,190,190,180,180))
>>> R> starts <- reduce(resize(ir, width=1, fix='start'), min.gapwidth=2)
>>> R> ends <- reduce(resize(ir, width=1, fix='end'), min.gapwidth=2)
>>> R> sc <- countOverlaps(ir, starts)
>>> R> ec <- countOverlaps(ir, ends)
>>>
>>> ... and ... good morning:
>>>
>>> R> split(ir, (paste(sc,ec,sep=":")))
>>> CompressedIRangesList of length 2
>>> $`1:1`
>>> IRanges of length 2
>>>    start end width
>>> [1]    10 189   180
>>> [2]    11 190   180
>>>
>>> $`1:2`
>>> IRanges of length 4
>>>    start end width
>>> [1]    10 199   190
>>> [2]    10 199   190
>>> [3]    11 200   190
>>> [4]     9 198   190
>>>
>>> HTH,
>>> -steve
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>  | Memorial Sloan-Kettering Cancer Center
>>>  | Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>
>>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor