[R] Sorting vector based on pairs of comparisons

William Dunlap wdun|@p @end|ng |rom t|bco@com
Thu Mar 14 16:07:11 CET 2019


This is called topological sorting in some circles.  The function below
will give you one ordering that is consistent with the contraints but not
all possible orderings.  I couldn't find such a function in core R so I
wrote one a while back based on Kahn's algorithm, as described in Wikipedia.

> Smaller <- c("ASD", "DFE", "ASD", "SDR", "EDF", "ASD")
> Larger <- c("SDR", "EDF", "KLM", "KLM", "SDR", "EDF")
> matComp <- cbind(Smaller, Larger)
> sortTopologically(matComp, unique(as.vector(matComp)))
[1] "ASD" "DFE" "EDF" "SDR" "KLM"

Bill Dunlap
TIBCO Software
wdunlap tibco.com

sortTopologically <- function(edgeMatrix, V)
{
    # edgeMatrix is 2-column matrix which describes a partial
    #   ordering of a set of vertices. The first column is the 'from'
vertex,
    #   the second the 'to' vertex.
    # V is the vector of all the vertices in the graph.
    #
    # Return a vector, L, consisting of the vertices in
    #   V in an order consistent with the partial ordering
    #   described by edgeMatrix.
    # Throw an error if such an ordering is not possible.
    #
    # Use Kahn's algorithm (
https://en.wikipedia.org/wiki/Topological_sorting).
    #
    # Note that disconnected vertices will not be mentioned in edgeMatrix,
    #   but will be in V.
    stopifnot(is.matrix(edgeMatrix),
              ncol(edgeMatrix)==2,
              !any(is.na(edgeMatrix)),
              !any(is.na(V)),
              all(as.vector(edgeMatrix) %in% V))
    L <- V[0] # match the type of the input
    S <- setdiff(V, edgeMatrix[, 2])
    V <- setdiff(V, S)
    while(length(S) > 0) {
        n <- S[1]
        # cat("Adding", n, "to L", "\n")
        L <- c(L, n)
        S <- S[-1]
        mRow <- edgeMatrix[,1] == n
        edgeMatrix <- edgeMatrix[ !mRow, , drop=FALSE ]
        S <- c(S, setdiff(V, edgeMatrix[,2]))
        V <- setdiff(V, S)
    }
    if (nrow(edgeMatrix) > 0) {
        stop("There are cycles in the dependency graph")
    }
    L
}


On Thu, Mar 14, 2019 at 4:30 AM Pedro Conte de Barros <pbarros using ualg.pt>
wrote:

> Dear All,
>
> This should be a quite established algorithm, but I have been searching
> for a couple days already without finding any satisfactory solution.
>
> I have a matrix defining pairs of Smaller-Larger arbitrary character
> values, like below
>
> Smaller <- c("ASD", "DFE", "ASD", "SDR", "EDF", "ASD")
>
> Larger <- c("SDR", "EDF", "KLM", "KLM", "SDR", "EDF"
>
> matComp <- cbind(Smaller, Larger)
>
> so that matComp looks like this
>
>       Smaller Larger
> [1,] "ASD"   "SDR"
> [2,] "DFE"   "EDF"
> [3,] "ASD"   "KLM"
> [4,] "SDR"   "KLM"
> [5,] "EDF"   "SDR"
> [6,] "ASD"   "EDF"
>
> This matrix establishes six pairs of "larger than" relationships that
> can be used to sort the unique values in the matrix,
>
>  > unique(as.vector(matComp))
> [1] "ASD" "DFE" "SDR" "EDF" "KLM"
>
> Specifically, I would like to get this:
>
> sorted <- c("ASD", "DFE", "EDF", "SDR", "KLM")
>
> or, equally valid (my matrix does not have the full information):
>
> sorted <- c("DFE", "ASD", "EDF", "SDR", "KLM")
>
> Preferably, I would get the different combinations of the unique values
> that satisfy the "larger than" conditions in the matrix...
>
>
> I am sure this is a trivial problem, but I could not find any algorithm
> to solve it.
>
> Any help would be highly appreciated
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list