[R] Mixed sorting/ordering of strings acknowledging roman numerals?

David Winsemius dwinsemius at comcast.net
Wed Aug 27 03:46:49 CEST 2014


On Aug 26, 2014, at 5:24 PM, Henrik Bengtsson wrote:

> Hi,
> 
> does anyone know of an implementation/function that sorts strings that
> *contain* roman numerals (I, II, III, IV, V, ...) which are treated as
> numbers.  In 'gtools' there is mixedsort() which does this for strings
> that contains (decimal) numbers.  I'm looking for a "mixedsortroman()"
> function that does the same but with roman numbers, e.g.

It's pretty easy to sort something you know to be congruent with the existing roman class:

romanC <- as.character( as.roman(1:3899) )
match(c("I", "II", "III","X","V"), romanC)
#[1]  1  2  3 10  5

But I guess you already know that, so you want a regex approach to parsing. Looking at the path taken by Warnes, it would involve doing something like his regex based insertion of a delimiter for "Roman numeral" but simpler because he needed to deal with decimal points and signs and exponent notation, none of which you appear to need. If you only need to consider character and Roman, then this hack of Warnes tools succeeds:

 mixedorderRoman <- function (x) 
{
    if (length(x) < 1) 
        return(NULL)
    else if (length(x) == 1) 
        return(1)
    if (is.numeric(x)) 
        return(order(x))
    delim = "\\$\\@\\$"
    roman <- function(x) {
        suppressWarnings(match(x, romanC))
    }
    nonnumeric <- function(x) {
        suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), 
            NA))
    }
    x <- as.character(x)
    which.nas <- which(is.na(x))
    which.blanks <- which(x == "")
    if (length(which.blanks) > 0) 
        x[which.blanks] <- -Inf
    if (length(which.nas) > 0) 
        x[which.nas] <- Inf
    delimited <- gsub("([IVXCL]+)", 
        paste(delim, "\\1", delim, sep = ""), x)
    step1 <- strsplit(delimited, delim)
    step1 <- lapply(step1, function(x) x[x > ""])
    step1.roman <- lapply(step1, roman)
    step1.character <- lapply(step1, nonnumeric)
    maxelem <- max(sapply(step1, length))
    step1.roman.t <- lapply(1:maxelem, function(i) sapply(step1.roman, 
        function(x) x[i]))
    step1.character.t <- lapply(1:maxelem, function(i) sapply(step1.character, 
        function(x) x[i]))
    rank.roman <- sapply(step1.roman.t, rank)
    rank.character <- sapply(step1.character.t, function(x) as.numeric(factor(x)))
    rank.roman[!is.na(rank.character)] <- 0
    rank.character <- t(t(rank.character) + apply(matrix(rank.roman), 
        2, max, na.rm = TRUE))
    rank.overall <- ifelse(is.na(rank.character), rank.numeric, 
        rank.character)
    order.frame <- as.data.frame(rank.overall)
    if (length(which.nas) > 0) 
        order.frame[which.nas, ] <- Inf
    retval <- do.call("order", order.frame)
    return(retval)
}

y[mixedorderRoman(y)]
 [1] "chr I"    "chr II"   "chr III"  "chr IV"   "chr IX"  
 [6] "chr V"    "chr VI"   "chr VII"  "chr VIII" "chr X"   
[11] "chr XI"   "chr XII" 


-- 
David.
> 
> ## DECIMAL NUMBERS
>> x <- sprintf("chr %d", 12:1)
>> x
> [1] "chr 12" "chr 11" "chr 10" "chr 9"  "chr 8"
> [6] "chr 7"  "chr 6"  "chr 5"  "chr 4"  "chr 3"
> [11] "chr 2"  "chr 1"
> 
>> sort(x)
> [1] "chr 1"  "chr 10" "chr 11" "chr 12" "chr 2"
> [6] "chr 3"  "chr 4"  "chr 5"  "chr 6"  "chr 7"
> [11] "chr 8"  "chr 9"
> 
>> gtools::mixedsort(x)
> [1] "chr 1"  "chr 2"  "chr 3"  "chr 4"  "chr 5"
> [6] "chr 6"  "chr 7"  "chr 8"  "chr 9"  "chr 10"
> [11] "chr 11" "chr 12"
> 
> 
> ## ROMAN NUMBERS
>> y <- sprintf("chr %s", as.roman(12:1))
>> y
> [1] "chr XII"  "chr XI"   "chr X"    "chr IX"
> [5] "chr VIII" "chr VII"  "chr VI"   "chr V"
> [9] "chr IV"   "chr III"  "chr II"   "chr I"
> 
>> sort(y)
> [1] "chr I"    "chr II"   "chr III"  "chr IV"
> [5] "chr IX"   "chr V"    "chr VI"   "chr VII"
> [9] "chr VIII" "chr X"    "chr XI"   "chr XII"
> 
>> mixedsortroman(y)
> [1] "chr I"    "chr II"   "chr III"  "chr IV"
> [5] "chr V"    "chr VI"   "chr VII"  "chr VIII"
> [9] "chr IX"   "chr X"    "chr XI"   "chr XII"
> 
> The latter is what I'm looking for.
> 
> Before hacking together something myself (e.g. identify roman numerals
> substrings, translate them to decimal numbers, use gtools::mixedsort()
> to sort them and then translate them back to roman numbers), I'd like
> to hear if someone already has this implemented/know of a package that
> does this.
> 
> Thanks,
> 
> Henrik
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list