[R] Why does `[<-.matrix` not exist in base R

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Sun Nov 24 15:06:14 CET 2019


Hello David,

On Sat, 23 Nov 2019 11:58:42 -0500
David Disabato <ddisab01 using gmail.com> wrote:

> For example, if I want to add a new column to a data.frame, I can do
> something like `myDataFrame[, "newColumn"] <- NA`.

<Opinion>

Arguably, iterative growth of data structures is not the "R style",
since it may lead to costly reallocations, resulting in the worst case
scenario of quadratic behaviour for linear operations.

If iterative processing is unavoidable, it might help to store partial
results in a list, then build the final matrix with a single call to
do.call(cbind, results).

</Opinion>

> However, with a matrix, this syntax does not work and I have to use a
> call to `cbind` and create a new object. For example, `mymatrix2 <-
> cbind(mymatrix, "newColumn" = NA)`.

> Is there a programming reason that base R does not have a matrix
> method for `[<-` or is it something that arguably should be added?

A data frame is a list of columns, so adding a new column is relatively
cheap: allocate enough memory for one column and append (roughly
speaking) a pointer to the list of pointers-to-column-data. This
results in reallocation of the *latter* list, but, since that list is
small in comparison to the whole data frame, it's okay. Note that this
operation does not affect any of the other columns belonging to the
same data frame.

A matrix, on the other hand, is a vector containing the whole matrix
with array dimensions stored as an attribute. Since R matrices are
stored by column [*], adding a new column to the matrix means resizing
the buffer to hold length(matrix) + nrow(matrix) elements, then
appending the new column to the end of the buffer. If the allocator
cannot enlarge the buffer in place (because the buffer is followed in
memory by another buffer), it has to allocate the new buffer elsewhere,
copy the memory, then free the old buffer.

To build a matrix by appending columns, one needs to perform this O(n)
operation O(n) times, resulting in O(n^2) performance. Adding rows is
even worse because memory has to be copied in parts, not as a whole.

Disclaimer: this is one reason I can think about why doesn't R offer
subassignment to non-existent matrix columns by default. The actual
reason might be different.

-- 
Best regards,
Ivan

[*]
https://github.com/wch/r-source/blob/bac4cd3013ead1379e20127d056ee036278b47ff/src/main/duplicate.c#L443



More information about the R-help mailing list