[BioC] Is a number within a set of ranges?

James W. MacDonald jmacdon at med.umich.edu
Mon Oct 29 19:15:30 CET 2007


Or a more simplistic alternative that will work with the data provided:

 > mat <- matrix(c(1,5,13,3,9,15), ncol=2)
 > gn <- matrix(c(14,4,10,6), ncol=1)
 > a <- apply(gn, 1, function(x) any(x > mat[,1] & x < mat[,2]))
 > gn[a,]
[1] 14  6

Best,

Jim



Sean Davis wrote:
> Daniel Brewer wrote:
>> I have a table with a start and stop column which defines a set of
>> ranges.  I have another table with a list of genes with associated
>> position.  What I would like to do is subset the gene table so it only
>> contains genes whose position is within any of the ranges.  What is the
>> best way to do this?  The only way I can think of is to construct a long
>> list of conditions linked by ORs but I am sure there must be a better way.
>>
>> Simple example:
>>
>> Start	Stop
>> 1	3
>> 5	9
>> 13	15
>>
>> Gene	Position
>> 1	14
>> 2	4
>> 3	10
>> 4	6
>>
>> I would like to get out:
>> Gene	Position
>> 1	14
>> 4	6
>>
>> Any ideas?
> 
> Here is a function that I use for finding overlapping segments.  It
> takes two data.frames, x and y.  Each must have "Chr", "Position", and
> "end" columns (often used in conjunction with snapCGH--hence, the
> Position rather than "start").  The "shift" parameter is a convenience
> function for doing "random shift" random distributions of genomic
> segments.  The function returns the indexes of x and y that overlap.
> So, if the first row of the x data.frame overlaps with the first 3 rows
> of y, the output will be:
> 
> Xindex	Yindex
> 1	1
> 1	2
> 1	3
> 
> Note that the data.frames can have more than those three columns, but
> those three columns MUST be present and named as mentioned.
> 
> Hope this helps.
> 
> Sean
> 
> Attached function below
> -----------------------
> 
> findOverlappingSegments <-
>   function(x,y,shift=0) {
>     swap <- nrow(x)<nrow(y) # Want to have larger set first for speed
>     if(swap) {
>       tmpx <- x
>       x <- y
>       y <- tmpx
>     }
>     intersectChrom <- intersect(x$Chr,y$Chr)
>     ret <- list()
>     for(i in intersectChrom) {
>       aindex <- which(y$Chr==i)
>       bindex <- which(x$Chr==i)
>       a <- y[aindex,]
>       b <- x[bindex,]
>       overlapsBrow <- mapply(function(Astart, Aend) {
>         which((Astart <= b$end & Astart>=b$Position) |
>               (Aend <= b$end & Aend>=b$Position) |
>               (Astart <= b$Position & Aend>=b$end) |
>               (Astart >= b$Position & Aend<=b$end))
>       },a$Position+shift,a$end+shift)
>       tmp1 <- unlist(overlapsBrow)
>       xindex <- bindex[tmp1]
>       yindex <-
> aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))]
>       if(swap) {
>         ret[[i]]<- cbind(yindex,xindex)
>       } else {
>         ret[[i]] <- cbind(xindex,yindex)
>       }
>       colnames(ret[[i]]) <- c('Xindex','Yindex')
>     }
>     return(do.call(rbind,ret))
>   }
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list