[Rd] Inconsistent rank in qr()
Serguei Sokol
sokol at insa-toulouse.fr
Tue Jan 23 11:36:41 CET 2018
Le 23/01/2018 à 08:47, Martin Maechler a écrit :
>>>>>> Serguei Sokol <sokol at insa-toulouse.fr>
>>>>>> on Mon, 22 Jan 2018 17:57:47 +0100 writes:
> > Le 22/01/2018 à 17:40, Keith O'Hara a écrit :
> >> This behavior is noted in the qr documentation, no?
> >>
> >> rank - the rank of x as computed by the decomposition(*): always full rank in the LAPACK case.
> > For a me a "full rank matrix" is a matrix the rank of which is indeed min(nrow(A), ncol(A))
> > but here the meaning of "always is full rank" is somewhat confusing. Does it mean
> > that only full rank matrices must be submitted to qr() when LAPACK=TRUE?
> > May be there is a jargon where "full rank" is a synonym of min(nrow(A), ncol(A)) for any matrix
> > but the fix to stick with commonly admitted rank definition (i.e. the number of linearly independent
> > columns in A) is so easy. Why to discard lapack case from it (even properly documented)?
>
> Because 99.5% of caller to qr() never look at '$rank',
> so why should we compute it every time qr() is called?
1. Because R already does it for linpack so it would be consistent to do so for lapack too.
2. Because R pretends that it is a part of a returned qr class.
3. Because its calculation is a negligible fraction of QR itself.
>
> ==> Matrix :: rankMatrix() does use "qr" as one of its several methods.
>
> --------------
>
> As wiser people than me have said (I'm paraphrasing, don't find a nice citation):
>
> While the rank of a matrix is a very well defined concept in
> mathematics (theory), its practical computation on a finite
> precision computer is much more challenging.
True. It is indeed depending of round-off errors during QR calculations and tolerance
setting but putting it just as min(nrow(A), ncol(A)) and still calling it rank of "full rank"
is by far the most misleading choice to my mind.
Once again, if we are already calculating it for linpack let do it in most consistent
way for lapack too. I can propose a patch if you will.
Serguei.
>
> The ?rankMatrix help page (package Matrix, part of your R)
> https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rankMatrix.html
> starts with the following 'Description'
>
> __ Compute ‘the’ matrix rank, a well-defined functional in theory(*), somewhat ambigous in practice. We provide several methods, the default corresponding to Matlab's definition.
>
> __ (*) The rank of a n x m matrix A, rk(A) is the maximal number of linearly independent columns (or rows); hence rk(A) <= min(n,m).
>
>
> >>> On Jan 22, 2018, at 11:21 AM, Serguei Sokol <sokol at insa-toulouse.fr> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have noticed different rank values calculated by qr() depending on
> >>> LAPACK parameter. When it is FALSE (default) a true rank is estimated and returned.
> >>> Unfortunately, when LAPACK is set to TRUE, the min(nrow(A), ncol(A)) is returned
> >>> which is only occasionally a true rank.
> >>>
> >>> Would not it be more consistent to replace the rank in the latter case by something
> >>> based on the following pseudo code ?
> >>>
> >>> d=abs(diag(qr))
> >>> rank=sum(d >= d[1]*tol)
> >>>
> >>> Here, we rely on the fact column pivoting is activated in the called lapack routine (dgeqp3)
> >>> and diagonal term in qr matrix are put in decreasing order (according to their absolute values).
> >>>
> >>> Serguei.
> >>>
> >>> How to reproduce:
> >>>
> >>> a=diag(2)
> >>> a[2,2]=0
> >>> qaf=qr(a, LAPACK=FALSE)
> >>> qaf$rank # shows 1. OK it's the true rank value
> >>> qat=qr(a, LAPACK=TRUE)
> >>> qat$rank #shows 2. Bad, it's not the expected value.
> >>>
>
> > --
> > Serguei Sokol
> > Ingenieur de recherche INRA
>
> > Cellule mathématique
> > LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
> > 135 Avenue de Rangueil
> > 31077 Toulouse Cedex 04
>
> > tel: +33 5 6155 9849
> > email: sokol at insa-toulouse.fr
> > http://www.lisbp.fr
>
More information about the R-devel
mailing list