[BioC] Getting first or last hit from a Hits object

Harris A. Jaffee hj at jhu.edu
Mon Oct 21 00:17:33 CEST 2013


I needed essentially your "last_hits" for bumphunter:::fuzzy.match2
In your notation, it comes out:

	last_hits_rows <- cumsum(rle(queryHits(all_hits))$lengths)
	last_hits <- all_hits[last_hits_rows,]

Having expended that effort, you could do

	first_hits_rows <- c(1, 1 + last_hits_rows[-length(query)])

Maybe there's a better way.  I have not compared these to anything of
Hector's.

On Oct 20, 2013, at 4:32 PM, Peter Hickey <hickey at wehi.EDU.AU> wrote:

> Hi all,
> 
> Suppose I've created a Hits object by 
> all_hits <- findOverlaps(query, subject, select = 'all', type = 'any') 
> and now I want to get the first (or last) hit for each query. Of course, I could use 
> first_hits <- findOverlaps(query, subject, select = 'first', type = 'any') 
> last_hits <- findOverlaps(query, subject, select = 'last', type = 'any') 
> but I this isn't necessary as all the required information is in the all_hits Hits object.
> 
> I was unable to find a function to do what I wanted, however, based at the source for findOverlaps-methods.R (actually Hector's branch at https://github.com/hcorrada/GenomicRanges/blob/master/R/findOverlaps-methods.R) I wrote the following simple functions
> getFirstHit <- function(all_hits){
>  ans <- rep.int(NA_integer_, queryLength(all_hits))
>  oo <- IRanges:::orderIntegerPairs(queryHits(all_hits), subjectHits(all_hits), decreasing = TRUE)
>  ans[queryHits(all_hits)[oo]] <- subjectHits(all_hits)[oo]
>  return(ans)
> }
> getLastHit <- function(all_hits){
>  ans <- rep.int(NA_integer_, queryLength(all_hits))
>  ans[queryHits(all_hits)] <- subjectHits(all_hits)
>  return(ans)
> }
> 
> These do what I want and are faster than re-running findOverlaps for my use case (millions of query intervals with tens of thousands of subject intervals)
> 
> Now to my questions:
> (1) Did I just overlook built-in "getFirstHit" and "getLastHit" functions?
> (2) Would others find "getFirstHit" and "getLastHit" useful enough to justify their inclusion in GenomicRanges (or IRanges, wherever it belongs)? 
> (3) I recall some discussion on this mailing list and Rdevel stating that using ":::" to access non-exported functions isn't a great idea for use in package code. If I need to include my own "getFirstHit" would someone be so kind as to point me to the best advice for avoiding calls to ":::" in my code to access IRanges:::orderIntegerPairs(queryHits)?
> 
> Many thanks,
> Pete
> --------------------------------
> Peter Hickey,
> PhD Student/Research Assistant,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Ph: +613 9345 2324
> 
> hickey at wehi.edu.au
> http://www.wehi.edu.au
> 
> 
> ______________________________________________________________________
> The information in this email is confidential and intend...{{dropped:8}}
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list