[R] Computing the rank of a matrix.

Martin Maechler maechler at stat.math.ethz.ch
Tue Jul 24 19:43:14 CEST 2007


>>>>> "RV" == RAVI VARADHAN <rvaradhan at jhmi.edu>
>>>>>     on Sat, 07 Apr 2007 18:39:36 -0400 writes:
	     ^^^^^^^^^^^^^^^^

this is a bit a late reply...  better late than never

    RV> Hi Martin,

Hi Ravi, thanks for your research.


    RV> I played a bit with rankMat.  I will first state the
    RV> conclusions of my numerical experiments and then present
    RV> the results to support them:

    RV> 1.  I don't believe that rankMat, or equivalently
    RV> Matlab's rank computation, is necessarily better than
    RV> qr(.)$rank or (qr., LAPACK=TRUE)$rank.  In fact, for the
    RV> notorious Hilbert matrix, rankMat can give poor
    RV> estimates of rank.

    RV> 2.  There exists no universally optimal (i.e. optimal
    RV> for all A) tol in qr(A, tol)$rank that would be
    RV> optimally close to rankMat.  The discrepancy in the
    RV> ranks computed by qr(A)$rank and rankMat(A) obviously
    RV> depends on the matrix A. This is evident from the tol
    RV> used in rankMat:
    RV>      tol <- max(d) * .Machine$double.eps * abs(singValA[1])
    RV> So, this value of tol in qr will also minimize the discrepancy. 

    RV> 3.  The tol parameter is irrelevant in qr(A, tol,
    RV> LAPACK=TRUE)$rank, i.e. LAPACK doesn't seem to utilize
    RV> the tol parameter when computing the rank.  However,
    RV> qr(A, tol, LAPACK=FALSE)$rank does depend on tol.

Yes, indeed!  The help page of qr() actually says so 
{at least now, don't know about 3 months ago}.

    RV> Now, here are the results:
    RV> 1.
    >> hilbert.rank <- matrix(NA,20,3)
    >> hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
    >> for (i in 1:20) {
    RV> + himat <- hilbert(i)
    RV> + hilbert.rank[i,1] <- rankMat(himat)
    RV> + hilbert.rank[i,2] <- qr(himat,tol=1.0e-14)$rank
    RV> + hilbert.rank[i,3] <- qr(himat, tol = .Machine$double.eps, LAPACK = TRUE)$rank
    RV> + }
    >> hilbert.rank
    RV> [,1] [,2] [,3]
    RV> [1,]    1    1    1
    RV> [2,]    2    2    2
    RV> [3,]    3    3    3
    RV> [4,]    4    4    4
    RV> [5,]    5    5    5
    RV> [6,]    6    6    6
    RV> [7,]    7    7    7
    RV> [8,]    8    8    8
    RV> [9,]    9    9    9
    RV> [10,]   10   10   10
    RV> [11,]   10   11   11
    RV> [12,]   11   12   12
    RV> [13,]   11   12   13
    RV> [14,]   11   13   14
    RV> [15,]   12   13   15
    RV> [16,]   12   15   16
    RV> [17,]   12   16   17
    RV> [18,]   12   16   18
    RV> [19,]   13   17   19
    RV> [20,]   13   17   20

    RV> We see that rankMat underestimates the rank for n > 10, but qr(himat, tol = .Machine$double.eps, LAPACK=TRUE)$rank gets it right.

Yes, indeed;  and that's  seems a bit  against the  ``common
knowledge'' that svd() is more reliable than qr()

Hmm.... I'm still baffled a bit..

Note that with the Hilbert matrices, 
one might argue that  hilbert(20) might really not have a
correct "estimated rank" of 20,
but at least for hilbert(13) or so, the rank should be == n.

BTW, there's a nice plot, slightly related to this problem,
using rcond() from the Matrix package :

  library(Matrix)

  hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
  rcHilb <- sapply(1:20, function(n) rcond(Matrix(hilbert(n))))

  plot(rcHilb, xlab = "n", log = "y", col = 2, type = "b",
       main = "reciprocal condition numbers of Hilbert(n)")

where one sees that the LAPACK code that underlies
Matrix::rcond() also gets into "problem" when estimating the
condition number for hilbert(n) when n >~ 16 .

I think if we wanted to make real progress here, we'd have to
consult with numerical analyist specialists.
But for me the topic is too remote to be followed up further at
the moment.

One conclusion for me is that to estimate the rank of a matrix
in current versions of R, one should use

  rankMat <- function(x)  qr(x, LAPACK = TRUE)$rank

(as was suggested as one possibility in the original thread)

Regards, and thank you again, Ravi.

Martin Maechler, ETH Zurich


    RV> 2.  
    RV> Here I first, created a random rectangular matrix, and then added a number of rows to it, where these new rows are the same as some of the original rows except for a tiny amount of noise, which I call eps.  So, the degree of rank deficiency is controlled by eps.  I let eps take on 3 values: 1.e-07, 1.e-10, and 1.e-14, and show that the optimal tol (in terms of being close to rankMat) depends on the level of precision (eps) in the matrix.
    >> set.seed(123)
    >> nrow <- 15
    >> ncol <- 20
    >> nsim <- 5000
    >> ndefic <- 4  # number of "nearly" dependent rows
    >> eps <- 1.e-07
    >> rnk <- matrix(NA, nsim, 5)
    >> for (j in 1:nsim) {
    RV> + A <- matrix(rnorm(nrow*ncol),nrow, ncol)
    RV> + newrows <- matrix(NA, ndefic, ncol)
    RV> + for (i in 1:ndefic) {
    RV> + newrows[i,] <- A[nrow-i,] + rnorm(ncol, sd=min(abs(A[nrow-i+1,]))* eps)
    RV> + }
    RV> + 
    RV> + B <- rbind(A,newrows)
    RV> + rnk[j,1] <- rankMat(B)
    RV> + rnk[j,2] <- qr(B, tol = 1.e-07)$rank
    RV> + rnk[j,3] <- qr(B, tol = 1.e-11)$rank
    RV> + rnk[j,4] <- qr(B, tol = 1.0e-14)$rank
    RV> + rnk[j,5] <- qr(B, tol = .Machine$double.eps, LAPACK = TRUE)$rank
    RV> + }
    >> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
    RV> [1] 19351    53     0     0

    RV> We observe that both qr(B, tol=1.e-14)$rank and qr(B, tol = .Machine$double.eps, LAPACK = TRUE)$rank give exactly the same rank as rankMat.

    RV> Now, we change eps <- 1.e-10 and the results are:
    >> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
    RV> [1] 19778 14263   166   220
    RV> This means that a tolerance of 1.e-14 works best.

    RV> Now we lower eps further: eps <- 1.e-14
    >> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
    RV> [1]     0     3   665 20000
    RV> Clearly, the smaller tolerances yield rank estimates that are higher than that given by rankMat.  That is, rankMat underestimates the rank as in the case of Hilbert matrix.

    RV> 3.
    RV> Now we show that qr(., tol, LAPACK=TRUE)$rank is independent of tol:
    >> exp <- -(7:16)
    >> tol <- 10^exp
    >> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x, LAPACK=FALSE)$rank)
    RV> [1] 10 12 14 14 15 16 16 17 18 19
    >> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x, LAPACK=TRUE)$rank)
    RV> [1] 20 20 20 20 20 20 20 20 20 20

    RV> Looking forward to comments.

    RV> Best,
    RV> Ravi. 

    RV> ----- Original Message -----
    RV> From: Martin Maechler <maechler at stat.math.ethz.ch>
    RV> Date: Saturday, April 7, 2007 10:57 am
    RV> Subject: Re: [R] Computing the rank of a matrix.
    RV> To: Ravi Varadhan <rvaradhan at jhmi.edu>
    RV> Cc: apjaworski at mmm.com, "'José Luis Aznarte M.'" <jlaznarte at decsai.ugr.es>, r-help at stat.math.ethz.ch


    >> >>>>> "Ravi" == Ravi Varadhan <rvaradhan at jhmi.edu>
    >> >>>>>     on Fri, 6 Apr 2007 12:44:33 -0400 writes:
    >> 
    Ravi> Hi, qr(A)$rank will work, but just be wary of the
    Ravi> tolerance parameter (default is 1.e-07), since the
    Ravi> rank computation could be sensitive to the tolerance
    Ravi> chosen.
    >> 
    >> Yes, indeed.
    >> 
    >> The point is that   rank(<Matrix>) 
    >> is well defined in pure math (linear algebra), as well as 
    >> a "singular matrix" is.
    >> 
    >> The same typically no longer makes sense as soon as you enter
    >> the real world: A matrix "close to singular" may have to be
    >> treated "as if singular" depending on its "singularity
    >> closeness" {{ learn about the condition number of a matrix }}
    >> and the same issues arise with rank(<matrix>).
    >> 
    >> Of course, the matlab programmers know all this (and much more),
    >> and indeed, matlab's  rank(A) really is  
    >> rank(A, tol = tol.default(A))
    >> 
    >> and is based on the SVD instead of QR decomposition since the
    >> former is said to be more reliable (even though slightly slower).
    >> 
    >> R's equivalent (with quite a bit of fool-proofing) would be the
    >> following function (assuming correct online documentation of matlab):
    >> 
    >> 
    >> rankMat <- function(A, tol = NULL, singValA = svd(A, 0,0)$d)
    >> {
    >> ## Purpose: rank of a matrix ``as Matlab''
    >> ## ----------------------------------------------------------------------
    >> ## Arguments: A: a numerical matrix, maybe non-square
    >> ##        	tol: numerical tolerance (compared to singular values)
    >> ##           singValA: vector of non-increasing singular values 
    >> of A
    >> ##                     (pass as argument if already known)
    >> ## ----------------------------------------------------------------------
    >> ## Author: Martin Maechler, Date:  7 Apr 2007, 16:16
    >> d <- dim(A)
    >> stopifnot(length(d) == 2, length(singValA) == min(d),
    >> diff(singValA) < 0)       # must be sorted decreasingly
    >> if(is.null(tol))
    >> tol <- max(d) * .Machine$double.eps * abs(singValA[1])
    >> else stopifnot(is.numeric(tol), tol >= 0)
    >> sum(singValA >= tol)
    >> }
    >> 
    >> 
    >> A small scale simulation with random matrices,
    >> i.e., things like
    >> 
    >> ## ranks of random matrices; here will have 5 all the time:
    >> table(replicate(1000, rankMat(matrix(rnorm(5*12),5, 12) )))# < 1 sec.
    >> 
    >> indicates that qr(.)$rank  gives the same typically,
    >> where I assume one should really use
    >> 
    >> qr(., tol = .Machine$double.eps, LAPACK = TRUE)$rank
    >> 
    >> to be closer to Matlab's default tolerance.
    >> 
    >> Ok, who has time to investigate further? 
    >> Research exercise:
    >> 
    1> Is there a fixed number, say  t0 <- 1e-15 
    1> for which  qr(A, tol = t0, LAPACK=TRUE)$rank is 
    1> ``optimally close'' to rankMat(A) ?
    >> 
    2> how easily do you get cases showing svd(.) to more reliable
    2> than qr(., LAPACK=TRUE)?
    >> 
    >> To solve this in an interesting way, you should probably
    >> investigate classes of "almost rank-deficient" matrices,
    >> and I'd also be interested if you "randomly" ever get matrices A
    >> with  rank(A) <  min(dim(A)) - 1
    >> (unless you construct some columns/rows exactly from earlier
    >> ones or use all-0 ones)
    >> 
    >> Martin Maechler, ETH Zurich
    >> 
    >> 
    >> 
    >> 
    Ravi> Ravi.
    >> 
    Ravi> -----------------------------------------------------------------
    Ravi> Ravi Varadhan, Ph.D.
    Ravi> Assistant Professor, The Center on Aging and Health
    >> .......
    Ravi> 
    >> 
    >> 
    Ravi> --------------------------------------------------------------------
    >> 
    >> 
    >> 
    >> >> How about
    >> 
    >> >>       qr(A)$rank
    >> 
    >> >> or perhaps
    >> 
    >> >>       qr(A, LAPACK=TRUE)$rank
    >> 
    >> >> Cheers,
    >> 
    >> >> Andy
    >> 
    >> 
    >> >>     Hi! Maybe this is a silly question, but I need the
    >> >> column rank (
    >> >> of a matrix and R function 'rank()' only gives me the
    >> >> ordering of the elements of my matrix.  How can I
    >> >> compute the column rank of a matrix? Is there not an R
    >> >> equivalent to Matlab's 'rank()'?  I've been browsing
    >> >> for a time now and I can't find anything, so any help
    >> >> will be greatly appreciated. Best regards!
    >> 
    >> 
    >> >> -- -- Jose Luis Aznarte M.
    >> >>  Department of Computer
    >> >> Science and Artificial Intelligence Universidad de
    >> >> Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax:
    >> >> +34 - 958 - 24 00 79



More information about the R-help mailing list