[R] Measure the frequencies of pairs in a matrix

Hermann Norpois hnorpois at gmail.com
Thu Oct 8 13:44:31 CEST 2015


Thanks a lot. This was very helpful. I want to apologise for being
unprecise. My favourite solution was William's.
Thanks again.

2015-10-07 18:39 GMT+02:00 William Dunlap <wdunlap at tibco.com>:

> You could also call table() on the columns of the input matrix, first
> converting them
> to factors with levels 1:max.  Then add together the upper and lower
> triangles of
> the table if order is not important.  E.g.,
> f2 <- function (mat)
> {
>     maxMat <- max(mat)
>     stopifnot(is.matrix(mat), all(mat %in% seq_len(maxMat)))
>     L <- split(factor(mat, levels = seq_len(maxMat)), col(mat))
>     Table <- do.call(table, unname(L))
>     ignoreOrder <- function(M) {
>         stopifnot(length(dim(M)) == 2)
>         lower <- lower.tri(M, diag = FALSE)
>         upper <- upper.tri(M, diag = FALSE)
>         M[lower] <- M[lower] + t(M)[lower]
>         M[upper] <- t(M)[upper]
>         M
>     }
>     ignoreOrder(Table)
> }
>
> > mat <- structure(c(5, 6, 5, 5, 4, 3, 6, 7, 4, 7, 5, 5, 5, 5, 6, 5, 5,
> 4, 3, 6, 7, 4, 7, 5, 5, 5, 6, 5, 4, 5, 5, 7, 5, 6, 3, 5, 6, 7,
> 6, 6, 5, 4, 5, 5, 7, 5, 6, 3, 5, 6, 7, 6), .Dim = c(26L, 2L))
> > f2(mat)
>
>      1  2  3  4  5  6  7
>   1  0  0  0  0  0  0  0
>   2  0  0  0  0  0  0  0
>   3  0  0  0  2  0  0  2
>   4  0  0  2  0  4  0  0
>   5  0  0  0  4  2 10  4
>   6  0  0  0  0 10  0  2
>   7  0  0  2  0  4  2  0
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Wed, Oct 7, 2015 at 6:09 AM, Boris Steipe <boris.steipe at utoronto.ca>
> wrote:
> > Still not sure I understand. But here is what I think you might mean:
> >
> > # Your data
> > mat <- structure(c(5, 6, 5, 5, 4, 3, 6, 7, 4, 7, 5, 5, 5, 5, 6, 5, 5,
> > 4, 3, 6, 7, 4, 7, 5, 5, 5, 6, 5, 4, 5, 5, 7, 5, 6, 3, 5, 6, 7,
> > 6, 6, 5, 4, 5, 5, 7, 5, 6, 3, 5, 6, 7, 6), .Dim = c(26L, 2L))
> >
> > # Create a square matrix with enough space to have an element for each
> pair. Since
> > # order is not important, only the upper triangle is used. If the matrix
> is
> > # large and sparse, a different approach might be needed.
> > freq <- matrix(numeric(max(mat) * max(mat)),  nrow = max(mat), ncol =
> max(mat))
> >
> > # Loop over your input
> > for (i in 1:nrow(mat)) {
> >     # Sort the elements of a row by size.
> >     x <- sort(mat[i,])
> >     # Increment the corresponding element of the frequency matrix
> >     freq[x[1], x[2]] <- freq[x[1], x[2]] + 1
> > }
> >
> > freq
> >
> >
> > Cheers,
> > B.
> >
> >
> >
> >
> >
> > On Oct 7, 2015, at 1:17 AM, Hermann Norpois <hnorpois at gmail.com> wrote:
> >
> >> Ok, this was misleading. And was not that important. My result matrix
> should look like this:
> >>
> >>   1    2   3   4   5   6   7 ...
> >> 1 p1 p2
> >> 2 p
> >> 3
> >> 4
> >>
> >> p1 etc are the frequencies of the combinations
> >>
> >> 1 and 2 for instance do not appear in my example. So the values would
> be zero. Actually, this part is not too important. I would be happy enough
> to solve the challenge with the frequencies of the pairs.
> >> Thanks Hermann
> >>
> >> 2015-10-07 2:40 GMT+02:00 Boris Steipe <boris.steipe at utoronto.ca>:
> >> Since order is not important to you, you can order your pairs (e.g.
> decreasing) before compiling the frequencies.
> >> But I don't understand the second part about values "that do not appear
> in the matrix". Do you mean you want to assess all combinations? If that's
> the case I would think about a hash table or other indexed data structure,
> rather than iterating through a matrix.
> >>
> >>
> >> B.
> >>
> >>
> >>
> >> On Oct 6, 2015, at 4:59 PM, Hermann Norpois <hnorpois at gmail.com> wrote:
> >>
> >> > Hello,
> >> >
> >> > I have a matrix mat (see dput(mat))
> >> >
> >> >> mat
> >> >      [,1] [,2]
> >> > [1,]    5    6
> >> > [2,]    6    5
> >> > [3,]    5    4
> >> > [4,]    5    5
> >> > ....
> >> >
> >> > I want the frequencies of the pairs in a new matrix, whereas the
> >> > combination 5 and 6 is the same as 6 and 5 (see the first two rows of
> mat).
> >> > In other words: What is the probability of each combination (each row)
> >> > ignoring the order in the combination. As a result I would like to
> have a
> >> > matrix that includes rows and cols 0, 1, 2 ... max (mat) that do not
> appear
> >> > in my matrix.
> >> >
> >> > dput (mat)
> >> > structure(c(5, 6, 5, 5, 4, 3, 6, 7, 4, 7, 5, 5, 5, 5, 6, 5, 5,
> >> > 4, 3, 6, 7, 4, 7, 5, 5, 5, 6, 5, 4, 5, 5, 7, 5, 6, 3, 5, 6, 7,
> >> > 6, 6, 5, 4, 5, 5, 7, 5, 6, 3, 5, 6, 7, 6), .Dim = c(26L, 2L))
> >> >
> >> > Thanks
> >> > Hermann
> >> >
> >> >       [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list