[Rd] nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Fri May 17 10:06:11 CEST 2019

Hi Martin,

Thanks for chiming in. Responses inline.

On Fri, May 17, 2019 at 12:32 AM Martin Maechler <maechler using stat.math.ethz.ch>
wrote:

> >>>>> Gabriel Becker
> >>>>>     on Thu, 16 May 2019 15:47:57 -0700 writes:
>
>     > Hi Hadley,
>     > Thanks for the counterpoint. Response below.
>
>     > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <h.wickham using gmail.com>
> wrote:
>
>     >> The existing behaviour seems inutitive to me. I would consider these
>     >> invariants for n vector x_i's each with size m:
>     >>
>     >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
>     >>
>
>     > Personally, no I wouldn't. I would consider m==0 a degenerate case,
> where
>     > there is no data, but I personally find matrices (or data.frames)
> with rows
>     > but no columns a very strange concept. The converse is not true, I
>     > understand the utility of columns but no rows, particularly in the
>     > data.frame case, but rows with no columns are observations we didn't
>     > observe anything about. Strange, imho.
>
> Gabe, here I have to very strongly disagree.
>
> Matrices (and higher order Arrays)  are  always definitely to
> behave "symmetrically" / "uniformly" with respect to all of their
> dimensions.
>
> We (and the S developers before us) have always taken a lot of
> care trying to ensure that this is true.
>
> So for the matrix case, if rows and columns behaved differently
> that would be a bug "by definition".
>

I realize now I could have been  clearer/more  explicit about this, but I
wasn't  arguing that the behavior should be different between columns and
rows, just that the behavior in the rows case didn't necessarily make a ton
of sense to me.  I was arguing that a change to both rbind and cbind be
considered when all length zero vectors are passed, not that rbind change
without cbind also changing. I will admit even here to feeling much more
strongly about the data.frame case.

That said, I do see that the cbind/columns argument seems harder (though
not impossible) for me to make. And maybe that's a good enough reason not
to consider such a change, because as I say, I agree the symmetry is
important, and would (also) want  cbind to change the same way rbind did if
such a change  happened, and that might bother many? more people than the
rbind case would. Maybe not though, based on the other responses in the
thread.

Honestly,  the most intuitive thing for me if you rbind or cbind a bunch of
length zero vectors together would be a  0x0 matrix, at  the very least in
the non-named arguments case. Its  a matrix with 0 elements in it, after
all. It seems perhaps that my intuition  is just somewhat  non-standard
though.

> Of course there's one thing where this uniformity / symmetry
> must be violated: in the coercion from and to atomic vectors:
> There, 'by column' (generalized for arrays to "earlier dimensions vary
> faster
> than later one") has been chosen, not the least because this had
> been adapted for Fortran (first, AFAIK) and all related ABIs
> dealing with Matrix vector arithmetic for very good (numerical,
> performance, known convention) reasons that enabled to know how
> fast numerical linear algebra should be implemented.
>

I do understand here, and would never suggest anything  that could damage
numerical linear algebra capabilities, in R or more broadly. That said, can
numerical algebra routines operate meaningfully in the degerate
one/both/all dimensions are 0 case anyway? Maybe they do, I'd be somewhat
surprised but not my area of expertise.

 Best,
~G

>
> Martin
>

	[[alternative HTML version deleted]]