[BioC] GenomicRanges:::.similarSeqnameConvention regular expressions needs some tweaking?

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Aug 20 18:53:51 CEST 2010


Hi all,

The GenomicRanges:::.similarSeqnameConvention function is returning
FALSE where, IMHO, it shouldn't be.

I've landed in a situation where this function is called with the
following values for seqs1/2:

seqs1:
 [1] "chr1"          "chr1_random"   "chr10"         "chr10_random"
"chr11"         "chr11_random"
 [7] "chr12"         "chr13"         "chr13_random"  "chr14"
"chr15"         "chr15_random"
[13] "chr16"         "chr16_random"  "chr17"         "chr17_random"
"chr18"         "chr18_random"
[19] "chr19"         "chr19_random"  "chr2"          "chr2_random"
"chr20"         "chr21"
[25] "chr21_random"  "chr22"         "chr22_random"  "chr22_h2_hap1"
"chr3"          "chr3_random"
[31] "chr4"          "chr4_random"   "chr5"          "chr5_random"
"chr5_h2_hap1"  "chr6"
[37] "chr6_random"   "chr6_cox_hap1" "chr6_qbl_hap2" "chr7"
"chr7_random"   "chr8"
[43] "chr8_random"   "chr9"          "chr9_random"   "chrM"
"chrX"          "chrX_random"
[49] "chrY"

seqs2:
 [1] "chrY"

and it looks like the "isArabic" function in funList is the culprit
here. Perhaps this regex test is so necessary, given all the other
tests that are being run?.

I guess it's not so easy to come up w/ a perfect heuristic for this
function to check "comparable seqnames", but IMHO, it seems as if my
scenario should pass as a "good" (ie. the conventions are similar).

Another scenario would be to just have this function return TRUE when
the intersection between seqs1 and seqs2 is length 0. I guess that
must be too simple though ...

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list