[Rd] surprised matrix (1:256, 8, 8) doesn't cause error/warning

Tue Feb 9 15:52:06 CET 2021

Hi Martin

Thank you! I very much understand your reservations and know it was a bit cheeky to poke.

I agree that in those cases where my (naive) patch results in two warnings, keeping only the new one would better.
No strong opinion about the case where either ncol or nrow is 0. Maybe a compromise would be to live with the inconsistency of only warning if length(data)>1, in order to allow legacy code such as  matrix(NA, 0, 3). Assuming that the main objective is not consistency but more robust users’ R code.

Btw if one were to refactor this properly (which I do not propose!), shouldn’t `matrix` just be a wrapper to `array`, whose added value is the inference of missing `nrow` and `ncol` values and dealing with `byrow`?

	Kind regards, and thanks again	
		Wolfgang

> Il giorno 9feb2021, alle ore 11:58, Martin Maechler <maechler using stat.math.ethz.ch> ha scritto:
> 
>>>>>> Wolfgang Huber 
>>>>>>    on Sat, 6 Feb 2021 19:49:11 +0100 writes:
> 
>> FWIW, I paste below a possible change to the warnings generating part of the do_matrix function in R/src/main/array.c that adds the kind of warning that Abby is asking for, and that IMHO would more often help users find bugs in their code than interfere with intended behaviour.
> 
> Thank you, Wolfgang.
> Honestly,  I had originally not wanted to get into this.
> 
> Functions that have been in use for longer than R exists (namely
> in S / S+) without the need for changes  is not something we'd
> typically easily consider for a change.
> {well, in the very old times,  0-extent matrices did not exist,
> so there were *some* changes during matrix()'s history ..}
> 
> But I think Abby and you have a point here.... so I have been
> looking at your patch .. and from code reading had wondered
> about another  behavior which *was* not quite consistent,
> and you eliminated completely with your patch:
> 
> op <- options(warn=1)
> for(n in 0:2) { cat(n,":\n") ; print(matrix(seq_len(n), 0, 0)) }
> options(op)
> 
> shows (in released R)
> 
> 0 :
> <0 x 0 matrix>
> 1 :
> <0 x 0 matrix>
> 2 :
> Warning in matrix(seq_len(n), 0, 0) :
>  data length exceeds size of matrix
> <0 x 0 matrix>
>> 
> 
> and it is really seems not logical that in matrix(x, 0,0),
> 'x' of  size 1 and  size (>=) 2  are treated differently.
> For consistency,  size 1  "should"  also warn (but read on!)
> 
> After your patch, theres' no warning in any case .. which is
> consistent, within the  matrix(x, 0,0) situation  but not consistent
> with your proposal to warn more often when  matrix(x, n,k)  is
> "obviously" using inconsistent dimensions.
> 
> When trying to let   matrix(1, 0,0)  also warn,
> I quickly found that this produces many new warnings in our (R
> base packages) examples and tests, notably from construction
> such as
> 
> 	 matrix(NA, 0, 3)
> or
> 	 matrix(NA_character_, 0, 4)
> 
> which would all have to be re-written as
> 
>      matrix(logical()  , 0, 3)
> or    matrix(character(), 0, 4)
> 
> The latter *is* more strictly self-consistent, but do we really
> want to impose such strictness ?
> 
> My conclusion:  Not at this moment
> 
> OTOH, I'd re-add the warning for  length(x) > 1  which has
> been there in the current code (but not your patch).
> {{No need to send another patch, I've changed too many small
>  things already, for me to be useful}}
> 
> This still needs adaption of one of the regression tests of R
> itself, and needs (at least) one of the tests of the Matrix package
> (warning turned into error, from options(warn = 2)).
> 
> I'm willing to go that route, but I'm sure this will entail some
> work by other package authors, too (and hence CRAN maintainers etc).
> 
> Opinions?
> 
> 
>>> matrix (1:6, nrow = 2, ncol = 3)
> 
>>> matrix (1:12, nrow = 2, ncol = 3)
>> [,1] [,2] [,3]
>> [1,]    1    3    5
>> [2,]    2    4    6
>> Warning message:
>> In matrix(1:12, nrow = 2, ncol = 3) :
>> data length incompatible with size of matrix
> 
>>> matrix (1:7, nrow = 2, ncol = 3)
>> Warning messages:
>> 1: In matrix(1:7, nrow = 2, ncol = 3) :
>> data length [7] is not a sub-multiple or multiple of the number of rows [2]
>> 2: In matrix(1:7, nrow = 2, ncol = 3) :
>> data length incompatible with size of matrix
> 
>>> matrix (1:8, nrow = 2, ncol = 3)
>> Warning messages:
>> 1: In matrix(1:8, nrow = 2, ncol = 3) :
>> data length [8] is not a sub-multiple or multiple of the number of columns [3]
>> 2: In matrix(1:8, nrow = 2, ncol = 3) :
>> data length incompatible with size of matrix
> 
>>> matrix (1:6, nrow = 0, ncol = 0)
>> <0 x 0 matrix>
>>> matrix (numeric(0), nrow = 2, ncol = 3)
>> [,1] [,2] [,3]
>> [1,]   NA   NA   NA
>> [2,]   NA   NA   NA
> 
>>> matrix(1:2, ncol = 8)
>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
>> [1,]    1    2    1    2    1    2    1    2
> 
>> It would be nice to combine the new warning with that about “...not a sub-multiple or multiple…” into a single warning, if appropriate (as in two of the examples above), but that would require bigger surgery way above my payscale.
> 
> I agree that in those cases it should only show one warning, and
> to keep things simple I'd say it should just be the 2nd of those
> above  or more precisely (I have to check if that's correct)
> 
>  "data length larger than size of matrix"
> 
> Martin
> 
>> Kind regards
>> Wolfgang Huber
> 
> 
>> Index: array.c
>> ===================================================================
>> --- array.c	(revision 79951)
>> +++ array.c	(working copy)
>> @@ -133,18 +133,19 @@
>> nc = (int) ceil((double) lendat / (double) nr);
>> }
> 
>> -    if(lendat > 0) {
>> +    if (lendat > 1) {
>> R_xlen_t nrc = (R_xlen_t) nr * nc;
>> -	if (lendat > 1 && nrc % lendat != 0) {
>> +	if ((nrc % lendat) != 0) {
>> if (((lendat > nr) && (lendat / nr) * nr != lendat) ||
>> ((lendat < nr) && (nr / lendat) * lendat != nr))
>> warning(_("data length [%d] is not a sub-multiple or multiple of the number of rows [%d]"), lendat, nr);
>> else if (((lendat > nc) && (lendat / nc) * nc != lendat) ||
>> ((lendat < nc) && (nc / lendat) * lendat != nc))
>> -		warning(_("data length [%d] is not a sub-multiple or multiple of the number of columns [%d]"), lendat, nc);
>> -	}
>> -	else if ((lendat > 1) && (nrc == 0)){
>> +		    warning(_("data length [%d] is not a sub-multiple or multiple of the number of columns [%d]"), lendat, nc);
>> +	if (nrc == 0)
>> warning(_("data length exceeds size of matrix"));
>> +        if (nrc != lendat)
>> +            warning(_("data length incompatible with size of matrix"));
>> }
>> }
> 
> 
>> ------
>> // And here, for easy checking that part of the code in the new form:
>> if (lendat > 1) {
>> R_xlen_t nrc = (R_xlen_t) nr * nc;
>> if ((nrc % lendat) != 0) {
>> if (((lendat > nr) && (lendat / nr) * nr != lendat) ||
>> ((lendat < nr) && (nr / lendat) * lendat != nr))
>> warning(_("data length [%d] is not a sub-multiple or multiple of the number of rows [%d]"), lendat, nr);
>> else if (((lendat > nc) && (lendat / nc) * nc != lendat) ||
>> ((lendat < nc) && (nc / lendat) * lendat != nc))
>> warning(_("data length [%d] is not a sub-multiple or multiple of the number of columns [%d]"), lendat, nc);
>> if (nrc == 0)
>> warning(_("data length exceeds size of matrix"));
>> if (nrc != lendat)  
>> warning(_("data length incompatible with size of matrix"));
>> }
>> }
> 
>>> Il giorno 2feb2021, alle ore 00:27, Abby Spurdle (/əˈbi/) <spurdle.a using gmail.com> ha scritto:
>>> 
>>> So, does that mean that a clean result is contingent on the length of
>>> the data being a multiple of both the number of rows and columns?
>>> 
>>> However, this rule is not straightforward.
>>> 
>>>> #EXAMPLE 1
>>>> #what I would expect
>>>> matrix (1:12, 0, 0)
>>> <0 x 0 matrix>
>>> Warning message:
>>> In matrix(1:12, 0, 0) : data length exceeds size of matrix
>>> 
>>>> #EXAMPLE 2
>>>> #don't like this
>>>> matrix (numeric (), 2, 3)
>>> [,1] [,2] [,3]
>>> [1,]   NA   NA   NA
>>> [2,]   NA   NA   NA
>>> 
>>> The first example is what I would expect, but is inconsistent with the
>>> previous examples.
>>> (Because zero is a valid multiple of twelve).
>>> 
>>> I dislike the second example with recycling of a zero-length vector.
>>> This *is* covered in the help file, but also seems inconsistent with
>>> the previous examples.
>>> (Because two and three are not valid multiples of zero).
>>> 
>>> Also, I can't think of any reason why someone would want to construct
>>> a matrix with extra data, and then discard part of it.
>>> And even if there was, then why not allow an arbitrarily longer length?
>>> 
>>> 
>>> On Mon, Feb 1, 2021 at 10:08 PM Martin Maechler
>>> <maechler using stat.math.ethz.ch> wrote:
>>>> 
>>>>>>>>> Abby Spurdle (/əˈbi/)
>>>>>>>>> on Mon, 1 Feb 2021 19:50:32 +1300 writes:
>>>> 
>>>>> I'm a little surprised that the following doesn't trigger an error or a warning.
>>>>> matrix (1:256, 8, 8)
>>>> 
>>>>> The help file says that the main argument is recycled, if it's too short.
>>>>> But doesn't say what happens if it's too long.
>>>> 
>>>> It's somewhat subtler than one may assume :
>>>> 
>>>>> matrix(1:9, 2,3)
>>>> [,1] [,2] [,3]
>>>> [1,]    1    3    5
>>>> [2,]    2    4    6
>>>> Warning message:
>>>> In matrix(1:9, 2, 3) :
>>>> data length [9] is not a sub-multiple or multiple of the number of rows [2]
>>>> 
>>>>> matrix(1:8, 2,3)
>>>> [,1] [,2] [,3]
>>>> [1,]    1    3    5
>>>> [2,]    2    4    6
>>>> Warning message:
>>>> In matrix(1:8, 2, 3) :
>>>> data length [8] is not a sub-multiple or multiple of the number of columns [3]
>>>> 
>>>>> matrix(1:12, 2,3)
>>>> [,1] [,2] [,3]
>>>> [1,]    1    3    5
>>>> [2,]    2    4    6
>>>>> 
>>>> 
>>>> So it looks to me the current behavior is quite on purpose.
>>>> Are you sure it's not documented at all when reading the docs
>>>> carefully?  (I did *not*, just now).
>>> 
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel