[BioC] Is a number within a set of ranges?

James W. MacDonald jmacdon at med.umich.edu
Mon Oct 29 21:44:55 CET 2007


In this case you don't gain much if anything by using apply(), which is 
just a nice wrapper to a for() loop (and the bad rap that for loops have 
in R isn't really applicable these days).

The real gain to be had is from vectorizing the comparison.

Best,

Jim



Oleg Sklyar wrote:
> You would like to avoid loops here, especially nested loops: this is
> what apply, sapply etc are for. Using your syntax:
> 
> final.presence = apply(gene, 1, function(x) any(x[2]>=place$start &
> x[2]<=place$end))
> 
> -  
> Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466
> 
> 
> On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote:
>> Hi Daniel,
>>
>> I'm very new to R and I'm far from a good programmer, but I think that this
>> small script should solve your problem. Well, at least for the example you
>> provided it worked. I hope it helps.
>>
>> Cheers,
>>
>> Artur
>>
>>> start <- c(1,5,13)
>>> stop <- c(3,9,15)
>>> place <- data.frame(start,stop)
>>>
>>> gene <- c(1,2,3,4)
>>> position <- c(14,4,10,6)
>>> position <- data.frame(gene,position)
>>>
>>> range <- list()
>>> for(a in 1:dim(place)[1])
>> + range[[a]] <- seq(place$start[a],place$stop[a])
>>> presence <- NULL
>>> final.presence <- NULL
>>> for(b in position$position)
>> +     {
>> +      for(c in 1:length(range))
>> +             {
>> +             presence <- c(presence,b%in%range[[c]])
>> +             }
>> +      final.presence <- c(final.presence,as.logical(sum(presence)))
>> +      presence <- NULL
>> +      }
>>> position[final.presence,]
>>   gene position
>> 1    1       14
>> 4    4        6
>>
>>
>> On 10/29/07, Daniel Brewer <daniel.brewer at icr.ac.uk> wrote:
>>> I have a table with a start and stop column which defines a set of
>>> ranges.  I have another table with a list of genes with associated
>>> position.  What I would like to do is subset the gene table so it only
>>> contains genes whose position is within any of the ranges.  What is the
>>> best way to do this?  The only way I can think of is to construct a long
>>> list of conditions linked by ORs but I am sure there must be a better way.
>>>
>>> Simple example:
>>>
>>> Start   Stop
>>> 1       3
>>> 5       9
>>> 13      15
>>>
>>> Gene    Position
>>> 1       14
>>> 2       4
>>> 3       10
>>> 4       6
>>>
>>> I would like to get out:
>>> Gene    Position
>>> 1       14
>>> 4       6
>>>
>>> Any ideas?
>>>
>>> Thanks
>>>
>>> Dan
>>>
>>> --
>>> **************************************************************
>>> Daniel Brewer, Ph.D.
>>> Institute of Cancer Research
>>> Email: daniel.brewer at icr.ac.uk
>>> **************************************************************
>>>
>>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable
>>> Company Limited by Guarantee, Registered in England under Company No. 534147
>>> with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>>>
>>> This e-mail message is confidential and for use by the...{{dropped:13}}
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list