[BioC] GenomicRanges:::.similarSeqnameConvention regular expressions needs some tweaking?

Steve Lianoglou mailinglist.honeypot at gmail.com
Sat Aug 28 06:16:13 CEST 2010


Hi Herve, thanks!

Honestly, I feel like there is only so much you can do, with respect
to a "perfect heuristic".

As long as there is *some* overlap between the seqnames from the query
and subject sequences, there's got to be *some* consistency, right?

A warning seems more than appropriate, and I guess I'd expect an error
only when there is *no* overlap between subject and query seqnames ...

Anyway, thanks again folks!

-steve

2010/8/27 Hervé Pagès <hpages at fhcrc.org>:
> Hi Steve,
>
> Until we come up with the perfect heuristic for this, I've modified
> findOverlaps() and its related methods so that they issue a warning
> instead of an error when 'query' and 'subject' don't appear to use
> a similar seqname convention.
>
> This is in GenomicRanges version 1.0.9 (release) and 1.1.23 (devel).
>
> Cheers,
> H.
>
>
> On 08/24/2010 09:35 AM, Steve Lianoglou wrote:
>>
>> Hi Martin,
>>
>> On Tue, Aug 24, 2010 at 12:14 PM, Martin Morgan<mtmorgan at fhcrc.org>
>>  wrote:
>>>
>>> On 08/24/2010 08:16 AM, Steve Lianoglou wrote:
>>>>
>>>> Hi,
>>>>
>>>> Sorry to be a pest about this, but could we get some traction on this?
>>>>
>>>> I've temporarily commented out the isArabic regex test to get around
>>>> this issue as a work around, but want to keep my own/analysis code in
>>>> line w/ the real GenomicRanges package.
>>>
>>> We've discussed this locally and will make changes this week. Martin
>>
>> Sweet.
>>
>> Thanks Martin (+ co),
>>
>> -steve
>>
>>>
>>>>
>>>> Thanks,
>>>> -steve
>>>>
>>>>
>>>> On Fri, Aug 20, 2010 at 12:53 PM, Steve Lianoglou
>>>> <mailinglist.honeypot at gmail.com>  wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> The GenomicRanges:::.similarSeqnameConvention function is returning
>>>>> FALSE where, IMHO, it shouldn't be.
>>>>>
>>>>> I've landed in a situation where this function is called with the
>>>>> following values for seqs1/2:
>>>>>
>>>>> seqs1:
>>>>>  [1] "chr1"          "chr1_random"   "chr10"         "chr10_random"
>>>>> "chr11"         "chr11_random"
>>>>>  [7] "chr12"         "chr13"         "chr13_random"  "chr14"
>>>>> "chr15"         "chr15_random"
>>>>> [13] "chr16"         "chr16_random"  "chr17"         "chr17_random"
>>>>> "chr18"         "chr18_random"
>>>>> [19] "chr19"         "chr19_random"  "chr2"          "chr2_random"
>>>>> "chr20"         "chr21"
>>>>> [25] "chr21_random"  "chr22"         "chr22_random"  "chr22_h2_hap1"
>>>>> "chr3"          "chr3_random"
>>>>> [31] "chr4"          "chr4_random"   "chr5"          "chr5_random"
>>>>> "chr5_h2_hap1"  "chr6"
>>>>> [37] "chr6_random"   "chr6_cox_hap1" "chr6_qbl_hap2" "chr7"
>>>>> "chr7_random"   "chr8"
>>>>> [43] "chr8_random"   "chr9"          "chr9_random"   "chrM"
>>>>> "chrX"          "chrX_random"
>>>>> [49] "chrY"
>>>>>
>>>>> seqs2:
>>>>>  [1] "chrY"
>>>>>
>>>>> and it looks like the "isArabic" function in funList is the culprit
>>>>> here. Perhaps this regex test is so necessary, given all the other
>>>>> tests that are being run?.
>>>>>
>>>>> I guess it's not so easy to come up w/ a perfect heuristic for this
>>>>> function to check "comparable seqnames", but IMHO, it seems as if my
>>>>> scenario should pass as a "good" (ie. the conventions are similar).
>>>>>
>>>>> Another scenario would be to just have this function return TRUE when
>>>>> the intersection between seqs1 and seqs2 is length 0. I guess that
>>>>> must be too simple though ...
>>>>>
>>>>> --
>>>>> Steve Lianoglou
>>>>> Graduate Student: Computational Systems Biology
>>>>>  | Memorial Sloan-Kettering Cancer Center
>>>>>  | Weill Medical College of Cornell University
>>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Martin Morgan
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>>
>>
>>
>>
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list