[Rd] Shallow copies

Simon Urbanek simon.urbanek at r-project.org
Wed Oct 1 16:27:49 CEST 2014


On Sep 30, 2014, at 5:20 PM, Matthieu Gomez <gomez.matthieu at gmail.com> wrote:
> 
> I have a question about shallow copies in R. Since R 3.1.0, subsetting a dataframe with respect to its columns no longer result in deep copies. This is an amazing change in my opinion. Now, subsetting a data.frame by rows (or subsetting a matrix by columns or rows) still does deep copies. In particular, it is my understanding that running a command on a very large subset of rows (say "sum" or "biglm" on non outliers observations) results in a deep copy of these rows first, which can require twice as much the memory of the original data.frame/matrix. If this is correct, I would be very interested to know more on whether this behavior can/may change in future versions of R.
> 

No. Subsetting a vector always requires a copy by definition*. Each column in a dataframe and each matrix is a vector, so any subset thereof always requires a copy no matter what you do.
Subsetting columns of a dataframe only requires a copy of the dataframe vector itself which is small by comparison (at least for datasets that use data frames).

Cheers,
Simon

* - you could try to do tricks where you fake a copy with things like COW mmaps, but you still need to have a copy conceptually. There are other tricks like deferred execution (you don't actually compute the result but only store the recipe for creating it), but those are more specialized and not generally available.


More information about the R-devel mailing list