[BioC] GenomicRanges:::.similarSeqnameConvention regular expressions needs some tweaking?

Hervé Pagès hpages at fhcrc.org
Sat Aug 28 20:44:00 CEST 2010


Hi Steve,

On 08/27/2010 09:16 PM, Steve Lianoglou wrote:
> Hi Herve, thanks!
>
> Honestly, I feel like there is only so much you can do, with respect
> to a "perfect heuristic".
>
> As long as there is *some* overlap between the seqnames from the query
> and subject sequences, there's got to be *some* consistency, right?
>
> A warning seems more than appropriate, and I guess I'd expect an error
> only when there is *no* overlap between subject and query seqnames ...

Yes I was thinking of something along those lines too. I just went for
the quickest fix for now. I'll try to improve this when I'm back from
vacations (after Labor Day). Thanks for the feedback!

Cheers,
H.

>
> Anyway, thanks again folks!
>
> -steve
>
> 2010/8/27 Hervé Pagès<hpages at fhcrc.org>:
>> Hi Steve,
>>
>> Until we come up with the perfect heuristic for this, I've modified
>> findOverlaps() and its related methods so that they issue a warning
>> instead of an error when 'query' and 'subject' don't appear to use
>> a similar seqname convention.
>>
>> This is in GenomicRanges version 1.0.9 (release) and 1.1.23 (devel).
>>
>> Cheers,
>> H.
>>
>>
>> On 08/24/2010 09:35 AM, Steve Lianoglou wrote:
>>>
>>> Hi Martin,
>>>
>>> On Tue, Aug 24, 2010 at 12:14 PM, Martin Morgan<mtmorgan at fhcrc.org>
>>>   wrote:
>>>>
>>>> On 08/24/2010 08:16 AM, Steve Lianoglou wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry to be a pest about this, but could we get some traction on this?
>>>>>
>>>>> I've temporarily commented out the isArabic regex test to get around
>>>>> this issue as a work around, but want to keep my own/analysis code in
>>>>> line w/ the real GenomicRanges package.
>>>>
>>>> We've discussed this locally and will make changes this week. Martin
>>>
>>> Sweet.
>>>
>>> Thanks Martin (+ co),
>>>
>>> -steve
>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> -steve
>>>>>
>>>>>
>>>>> On Fri, Aug 20, 2010 at 12:53 PM, Steve Lianoglou
>>>>> <mailinglist.honeypot at gmail.com>    wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> The GenomicRanges:::.similarSeqnameConvention function is returning
>>>>>> FALSE where, IMHO, it shouldn't be.
>>>>>>
>>>>>> I've landed in a situation where this function is called with the
>>>>>> following values for seqs1/2:
>>>>>>
>>>>>> seqs1:
>>>>>>   [1] "chr1"          "chr1_random"   "chr10"         "chr10_random"
>>>>>> "chr11"         "chr11_random"
>>>>>>   [7] "chr12"         "chr13"         "chr13_random"  "chr14"
>>>>>> "chr15"         "chr15_random"
>>>>>> [13] "chr16"         "chr16_random"  "chr17"         "chr17_random"
>>>>>> "chr18"         "chr18_random"
>>>>>> [19] "chr19"         "chr19_random"  "chr2"          "chr2_random"
>>>>>> "chr20"         "chr21"
>>>>>> [25] "chr21_random"  "chr22"         "chr22_random"  "chr22_h2_hap1"
>>>>>> "chr3"          "chr3_random"
>>>>>> [31] "chr4"          "chr4_random"   "chr5"          "chr5_random"
>>>>>> "chr5_h2_hap1"  "chr6"
>>>>>> [37] "chr6_random"   "chr6_cox_hap1" "chr6_qbl_hap2" "chr7"
>>>>>> "chr7_random"   "chr8"
>>>>>> [43] "chr8_random"   "chr9"          "chr9_random"   "chrM"
>>>>>> "chrX"          "chrX_random"
>>>>>> [49] "chrY"
>>>>>>
>>>>>> seqs2:
>>>>>>   [1] "chrY"
>>>>>>
>>>>>> and it looks like the "isArabic" function in funList is the culprit
>>>>>> here. Perhaps this regex test is so necessary, given all the other
>>>>>> tests that are being run?.
>>>>>>
>>>>>> I guess it's not so easy to come up w/ a perfect heuristic for this
>>>>>> function to check "comparable seqnames", but IMHO, it seems as if my
>>>>>> scenario should pass as a "good" (ie. the conventions are similar).
>>>>>>
>>>>>> Another scenario would be to just have this function return TRUE when
>>>>>> the intersection between seqs1 and seqs2 is length 0. I guess that
>>>>>> must be too simple though ...
>>>>>>
>>>>>> --
>>>>>> Steve Lianoglou
>>>>>> Graduate Student: Computational Systems Biology
>>>>>>   | Memorial Sloan-Kettering Cancer Center
>>>>>>   | Weill Medical College of Cornell University
>>>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Martin Morgan
>>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N.
>>>> PO Box 19024 Seattle, WA 98109
>>>>
>>>> Location: Arnold Building M1 B861
>>>> Phone: (206) 667-2793
>>>>
>>>
>>>
>>>
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>
>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list