[Rd] Inconsistent rank in qr()

Martin Maechler maechler at stat.math.ethz.ch
Tue Jan 23 08:47:01 CET 2018


>>>>> Serguei Sokol <sokol at insa-toulouse.fr>
>>>>>     on Mon, 22 Jan 2018 17:57:47 +0100 writes:

    > Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
    >> This behavior is noted in the qr documentation, no?
    >> 
    >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.
    > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))
    > but here the meaning of "always is full rank" is somewhat confusing. Does it mean
    > that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
    > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix
    > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent
    > columns in A) is so easy. Why to discard lapack case from it (even properly documented)?

Because 99.5% of caller to qr()  never look at '$rank', 
so why should we compute it every time qr() is called?

==> Matrix :: rankMatrix() does use "qr" as one of its several methods.

--------------

As wiser people than me have said (I'm paraphrasing, don't find a nice citation):

  While the rank of a matrix is a very well defined concept in
  mathematics (theory), its practical computation on a finite
  precision computer is much more challenging.

The ?rankMatrix  help page (package Matrix, part of your R)
   https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html
starts with the following 'Description' 

__ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition.

__ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m).


    >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <sokol at insa-toulouse.fr> wrote:
    >>> 
    >>> Hi,
    >>> 
    >>> I have noticed different rank values calculated by qr() depending on
    >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
    >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
    >>> which is only occasionally a true rank.
    >>> 
    >>> Would not it be more consistent to replace the rank in the latter case by something
    >>> based on the following pseudo code ?
    >>> 
    >>> d=abs(diag(qr))
    >>> rank=sum(d >= d[1]*tol)
    >>> 
    >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
    >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
    >>> 
    >>> Serguei.
    >>> 
    >>> How to reproduce:
    >>> 
    >>> a=diag(2)
    >>> a[2,2]=0
    >>> qaf=qr(a, LAPACK=FALSE)
    >>> qaf$rank # shows 1. OK it's the true rank value
    >>> qat=qr(a, LAPACK=TRUE)
    >>> qat$rank #shows 2. Bad, it's not the expected value.
    >>> 

    > -- 
    > Serguei Sokol
    > Ingenieur de recherche INRA

    > Cellule mathématique
    > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
    > 135 Avenue de Rangueil
    > 31077 Toulouse Cedex 04

    > tel: +33 5 6155 9849
    > email: sokol at insa-toulouse.fr
    > http://www.lisbp.fr



More information about the R-devel mailing list