[BioC] GRanges nearest problem

Nishant Gopalakrishnan ngopalak at fhcrc.org
Fri Apr 15 03:09:58 CEST 2011


Hi Arne,

Thank you for pointing out the error. I have checked in some changes to 
fix this issue.

Nishant


On 04/14/2011 06:21 AM, Valerie Obenchain wrote:
> Hi Arne,
>
> Thanks for pointing out these bugs. I'll post again here when they 
> have been fixed.
>
> Valerie
>
>
> On 04/13/11 05:29, Mueller, Arne wrote:
>> Hello,
>>
>> I've come across a problem in GRanges  nearest, if subject of the 
>> nearest call contains strand information (+/-) and the query does not 
>> (*), the method takes a long time to run and raises warnings:
>>
>> mm9.pro.gr and mm9.2ktiles.gr are both Granges objects.
>>
>>> strand(mm9.pro.gr) = "-"
>>> strand(mm9.2ktiles.gr) = "*"
>>> system.time(nn<- nearest(mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>>     user  system elapsed
>>   27.150   0.002  27.416
>> There were 50 or more warnings (use warnings() to see the first 50)
>>> warnings()
>> Warning messages:
>> 1: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>>    longer object length is not a multiple of shorter object length
>> 2: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>>    longer object length is not a multiple of shorter object length
>> 3: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>>    longer object length is not a multiple of shorter object length
>> 4: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>>    longer object length is not a multiple of shorter object length
>>>>
>> I think if a range in either query or subject is non-stranded (*) 
>> both, the method should look for the nearest neighbor ignoring the 
>> strand (at least that's my suggestion ;-).
>>
>> If I set the strand info of the subject to "*" the method runs fine:
>>
>>> strand(mm9.pro.gr) = "*"
>>> system.time(nn<- nearest(mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>>     user  system elapsed
>>    0.264   0.000   0.264
>>
>> If the query is "stranded" (+/-) and the subject isn't, the method 
>> runs fine, too (though longer as if both query and subject are 
>> non-stranded, but I guess this can be expected):
>>
>>>   system.time(nn<- nearest(mm9.pro.gr[1:5000], 
>>> mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>>     user  system elapsed
>>    3.084   0.000   3.125
>>
>> Another odd behavior is that if the query contains sequence names not 
>> contained in the subject an error is raised – the other way around 
>> works fine. Wouldn't it make sense so set the vector elements of 
>> sequences only found in the query to NA?
>>
>>      Kind regards,
>>
>>      Arne
>>
>>
>>
>>
>>     [[alternative HTML version deleted]]
>>
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list