[R] Data frames, passing by value, and performance (Matt Shotwell)

biostatmatt biostatmatt at gmail.com
Mon May 24 16:54:25 CEST 2010


R is pretty smart about duplicating only when necessary. That is,
arguments passed to a function are copy-on-write. Also, I think (someone
more knowledgeable please correct if I'm wrong) it may be better to use
the data frame, which is just a list internally, because if you only
modify one column, only that column is duplicated, not the entire data
frame. If you were to use a matrix, the entire matrix would require
duplication.

-Matt

On Mon, 2010-05-24 at 09:29 -0500, gschultz at scriptpro.com wrote:
> I understand that everything passed to an R function is passed "by
> value".  This would seem to include data frames, which my current
> application uses heavily, both for storing program inputs, and holding
> intermediate and final results.  In trying to get greater performance
> out of my R code, I am wondering if there is any clean way to access
> data frames without having them copied all the time.  Or is my only
> option to make them global, and write to them using <<-  ?
> 
> I have considered using matrices, but I like the self-documenting aspect
> of data frame column names.  Input/output to disk is not the issue here,
> as that does not take long in my case.  It's just the internal parameter
> passing that I'm concerned about.
> 
> (I've checked R-FAQ, R-lang and searched the R-help archives, but didn't
> see any specific mentions of this.)
> 
> Thanks.
> 
> Grant Schultz
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list