# [R] Computing the rank of a matrix.

Tue Jul 24 23:35:25 CEST 2007

```Hi Martin,

I just realized (courtesy: ?qr) that LAPACK=TRUE always gives full rank, no
matter what the matrix and tolerance are. So, clearly my results using
LAPACK=TRUE should be ignored.  So, the real comparison is only between
rankMat and qr(., LAPACK=FALSE)\$rank.  I can’t help but fell that there can
be no "correct" solution to an ill-posed problem.  Furthermore, I haven't
come across a real application where the numerical estimate of a rank is
directly useful.

Best,
Ravi.

----------------------------------------------------------------------------
-------

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

----------------------------------------------------------------------------
--------

-----Original Message-----
From: Martin Maechler [mailto:maechler at stat.math.ethz.ch]
Sent: Tuesday, July 24, 2007 1:43 PM
Cc: Martin Maechler; r-help at stat.math.ethz.ch
Subject: Re: [R] Computing the rank of a matrix.

>>>>>     on Sat, 07 Apr 2007 18:39:36 -0400 writes:
^^^^^^^^^^^^^^^^

this is a bit a late reply...  better late than never

RV> Hi Martin,

Hi Ravi, thanks for your research.

RV> I played a bit with rankMat.  I will first state the
RV> conclusions of my numerical experiments and then present
RV> the results to support them:

RV> 1.  I don't believe that rankMat, or equivalently
RV> Matlab's rank computation, is necessarily better than
RV> qr(.)\$rank or (qr., LAPACK=TRUE)\$rank.  In fact, for the
RV> notorious Hilbert matrix, rankMat can give poor
RV> estimates of rank.

RV> 2.  There exists no universally optimal (i.e. optimal
RV> for all A) tol in qr(A, tol)\$rank that would be
RV> optimally close to rankMat.  The discrepancy in the
RV> ranks computed by qr(A)\$rank and rankMat(A) obviously
RV> depends on the matrix A. This is evident from the tol
RV> used in rankMat:
RV>      tol <- max(d) * .Machine\$double.eps * abs(singValA[1])
RV> So, this value of tol in qr will also minimize the discrepancy.

RV> 3.  The tol parameter is irrelevant in qr(A, tol,
RV> LAPACK=TRUE)\$rank, i.e. LAPACK doesn't seem to utilize
RV> the tol parameter when computing the rank.  However,
RV> qr(A, tol, LAPACK=FALSE)\$rank does depend on tol.

Yes, indeed!  The help page of qr() actually says so
{at least now, don't know about 3 months ago}.

RV> Now, here are the results:
RV> 1.
>> hilbert.rank <- matrix(NA,20,3)
>> hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
>> for (i in 1:20) {
RV> + himat <- hilbert(i)
RV> + hilbert.rank[i,1] <- rankMat(himat)
RV> + hilbert.rank[i,2] <- qr(himat,tol=1.0e-14)\$rank
RV> + hilbert.rank[i,3] <- qr(himat, tol = .Machine\$double.eps, LAPACK =
TRUE)\$rank
RV> + }
>> hilbert.rank
RV> [,1] [,2] [,3]
RV> [1,]    1    1    1
RV> [2,]    2    2    2
RV> [3,]    3    3    3
RV> [4,]    4    4    4
RV> [5,]    5    5    5
RV> [6,]    6    6    6
RV> [7,]    7    7    7
RV> [8,]    8    8    8
RV> [9,]    9    9    9
RV> [10,]   10   10   10
RV> [11,]   10   11   11
RV> [12,]   11   12   12
RV> [13,]   11   12   13
RV> [14,]   11   13   14
RV> [15,]   12   13   15
RV> [16,]   12   15   16
RV> [17,]   12   16   17
RV> [18,]   12   16   18
RV> [19,]   13   17   19
RV> [20,]   13   17   20

RV> We see that rankMat underestimates the rank for n > 10, but
qr(himat, tol = .Machine\$double.eps, LAPACK=TRUE)\$rank gets it right.

Yes, indeed;  and that's  seems a bit  against the  ``common
knowledge'' that svd() is more reliable than qr()

Hmm.... I'm still baffled a bit..

Note that with the Hilbert matrices,
one might argue that  hilbert(20) might really not have a
correct "estimated rank" of 20,
but at least for hilbert(13) or so, the rank should be == n.

BTW, there's a nice plot, slightly related to this problem,
using rcond() from the Matrix package :

library(Matrix)

hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
rcHilb <- sapply(1:20, function(n) rcond(Matrix(hilbert(n))))

plot(rcHilb, xlab = "n", log = "y", col = 2, type = "b",
main = "reciprocal condition numbers of Hilbert(n)")

where one sees that the LAPACK code that underlies
Matrix::rcond() also gets into "problem" when estimating the
condition number for hilbert(n) when n >~ 16 .

I think if we wanted to make real progress here, we'd have to
consult with numerical analyist specialists.
But for me the topic is too remote to be followed up further at
the moment.

One conclusion for me is that to estimate the rank of a matrix
in current versions of R, one should use

rankMat <- function(x)  qr(x, LAPACK = TRUE)\$rank

(as was suggested as one possibility in the original thread)

Regards, and thank you again, Ravi.

Martin Maechler, ETH Zurich

RV> 2.
RV> Here I first, created a random rectangular matrix, and then added a
number of rows to it, where these new rows are the same as some of the
original rows except for a tiny amount of noise, which I call eps.  So, the
degree of rank deficiency is controlled by eps.  I let eps take on 3 values:
1.e-07, 1.e-10, and 1.e-14, and show that the optimal tol (in terms of being
close to rankMat) depends on the level of precision (eps) in the matrix.
>> set.seed(123)
>> nrow <- 15
>> ncol <- 20
>> nsim <- 5000
>> ndefic <- 4  # number of "nearly" dependent rows
>> eps <- 1.e-07
>> rnk <- matrix(NA, nsim, 5)
>> for (j in 1:nsim) {
RV> + A <- matrix(rnorm(nrow*ncol),nrow, ncol)
RV> + newrows <- matrix(NA, ndefic, ncol)
RV> + for (i in 1:ndefic) {
RV> + newrows[i,] <- A[nrow-i,] + rnorm(ncol, sd=min(abs(A[nrow-i+1,]))*
eps)
RV> + }
RV> +
RV> + B <- rbind(A,newrows)
RV> + rnk[j,1] <- rankMat(B)
RV> + rnk[j,2] <- qr(B, tol = 1.e-07)\$rank
RV> + rnk[j,3] <- qr(B, tol = 1.e-11)\$rank
RV> + rnk[j,4] <- qr(B, tol = 1.0e-14)\$rank
RV> + rnk[j,5] <- qr(B, tol = .Machine\$double.eps, LAPACK = TRUE)\$rank
RV> + }
>> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
RV> [1] 19351    53     0     0

RV> We observe that both qr(B, tol=1.e-14)\$rank and qr(B, tol =
.Machine\$double.eps, LAPACK = TRUE)\$rank give exactly the same rank as
rankMat.

RV> Now, we change eps <- 1.e-10 and the results are:
>> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
RV> [1] 19778 14263   166   220
RV> This means that a tolerance of 1.e-14 works best.

RV> Now we lower eps further: eps <- 1.e-14
>> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
RV> [1]     0     3   665 20000
RV> Clearly, the smaller tolerances yield rank estimates that are higher
than that given by rankMat.  That is, rankMat underestimates the rank as in
the case of Hilbert matrix.

RV> 3.
RV> Now we show that qr(., tol, LAPACK=TRUE)\$rank is independent of tol:
>> exp <- -(7:16)
>> tol <- 10^exp
>> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x,
LAPACK=FALSE)\$rank)
RV> [1] 10 12 14 14 15 16 16 17 18 19
>> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x,
LAPACK=TRUE)\$rank)
RV> [1] 20 20 20 20 20 20 20 20 20 20

RV> Best,
RV> Ravi.

RV> ----- Original Message -----
RV> From: Martin Maechler <maechler at stat.math.ethz.ch>
RV> Date: Saturday, April 7, 2007 10:57 am
RV> Subject: Re: [R] Computing the rank of a matrix.
RV> Cc: apjaworski at mmm.com, "'José Luis Aznarte M.'"
<jlaznarte at decsai.ugr.es>, r-help at stat.math.ethz.ch

>> >>>>>     on Fri, 6 Apr 2007 12:44:33 -0400 writes:
>>
Ravi> Hi, qr(A)\$rank will work, but just be wary of the
Ravi> tolerance parameter (default is 1.e-07), since the
Ravi> rank computation could be sensitive to the tolerance
Ravi> chosen.
>>
>> Yes, indeed.
>>
>> The point is that   rank(<Matrix>)
>> is well defined in pure math (linear algebra), as well as
>> a "singular matrix" is.
>>
>> The same typically no longer makes sense as soon as you enter
>> the real world: A matrix "close to singular" may have to be
>> treated "as if singular" depending on its "singularity
>> closeness" {{ learn about the condition number of a matrix }}
>> and the same issues arise with rank(<matrix>).
>>
>> Of course, the matlab programmers know all this (and much more),
>> and indeed, matlab's  rank(A) really is
>> rank(A, tol = tol.default(A))
>>
>> and is based on the SVD instead of QR decomposition since the
>> former is said to be more reliable (even though slightly slower).
>>
>> R's equivalent (with quite a bit of fool-proofing) would be the
>> following function (assuming correct online documentation of matlab):
>>
>>
>> rankMat <- function(A, tol = NULL, singValA = svd(A, 0,0)\$d)
>> {
>> ## Purpose: rank of a matrix ``as Matlab''
>> ##
----------------------------------------------------------------------
>> ## Arguments: A: a numerical matrix, maybe non-square
>> ##        	tol: numerical tolerance (compared to singular
values)
>> ##           singValA: vector of non-increasing singular values
>> of A
>> ##                     (pass as argument if already known)
>> ##
----------------------------------------------------------------------
>> ## Author: Martin Maechler, Date:  7 Apr 2007, 16:16
>> d <- dim(A)
>> stopifnot(length(d) == 2, length(singValA) == min(d),
>> diff(singValA) < 0)       # must be sorted decreasingly
>> if(is.null(tol))
>> tol <- max(d) * .Machine\$double.eps * abs(singValA[1])
>> else stopifnot(is.numeric(tol), tol >= 0)
>> sum(singValA >= tol)
>> }
>>
>>
>> A small scale simulation with random matrices,
>> i.e., things like
>>
>> ## ranks of random matrices; here will have 5 all the time:
>> table(replicate(1000, rankMat(matrix(rnorm(5*12),5, 12) )))# < 1 sec.
>>
>> indicates that qr(.)\$rank  gives the same typically,
>> where I assume one should really use
>>
>> qr(., tol = .Machine\$double.eps, LAPACK = TRUE)\$rank
>>
>> to be closer to Matlab's default tolerance.
>>
>> Ok, who has time to investigate further?
>> Research exercise:
>>
1> Is there a fixed number, say  t0 <- 1e-15
1> for which  qr(A, tol = t0, LAPACK=TRUE)\$rank is
1> ``optimally close'' to rankMat(A) ?
>>
2> how easily do you get cases showing svd(.) to more reliable
2> than qr(., LAPACK=TRUE)?
>>
>> To solve this in an interesting way, you should probably
>> investigate classes of "almost rank-deficient" matrices,
>> and I'd also be interested if you "randomly" ever get matrices A
>> with  rank(A) <  min(dim(A)) - 1
>> (unless you construct some columns/rows exactly from earlier
>> ones or use all-0 ones)
>>
>> Martin Maechler, ETH Zurich
>>
>>
>>
>>
Ravi> Ravi.
>>
Ravi> -----------------------------------------------------------------
Ravi> Assistant Professor, The Center on Aging and Health
>> .......
Ravi>
>>
>>
Ravi>
--------------------------------------------------------------------
>>
>>
>>
>>
>> >>       qr(A)\$rank
>>
>> >> or perhaps
>>
>> >>       qr(A, LAPACK=TRUE)\$rank
>>
>> >> Cheers,
>>
>> >> Andy
>>
>>
>> >>     Hi! Maybe this is a silly question, but I need the
>> >> column rank (
>> >> of a matrix and R function 'rank()' only gives me the
>> >> ordering of the elements of my matrix.  How can I
>> >> compute the column rank of a matrix? Is there not an R
>> >> equivalent to Matlab's 'rank()'?  I've been browsing
>> >> for a time now and I can't find anything, so any help
>> >> will be greatly appreciated. Best regards!
>>
>>
>> >> -- -- Jose Luis Aznarte M.
>> >>  Department of Computer
>> >> Science and Artificial Intelligence Universidad de
>> >> Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax:
>> >> +34 - 958 - 24 00 79

```