[R] Base within reverses column order

David Winsemius dwinsemius at comcast.net
Mon Apr 7 18:09:41 CEST 2014


On Apr 4, 2014, at 10:32 AM, Dan Murphy wrote:

> I just noticed this annoyance, but I'm not the first one, apparently
> -- see http://lists.r-forge.r-project.org/pipermail/datatable-help/2012-May/001176.html
> 
> The thread never answered the OP's question "Is this a bug?" so I
> assume the answer, unfortunately, is No.
> 

Well, using `within` outside of data,table objects is not really the "true" data-table way, is it?

DT= data.table(a=1:10, b=2:11)
> DT[ , ( c("c","d") ) := .SD[, list(a+b,a*b)] ]
> DT
     a  b  c   d
 1:  1  2  3   2
 2:  2  3  5   6
 3:  3  4  7  12
 4:  4  5  9  20
 5:  5  6 11  30
 6:  6  7 13  42
 7:  7  8 15  56
 8:  8  9 17  72
 9:  9 10 19  90
10: 10 11 21 110

(This answer was from Simon O'Hanlon on SO: http://stackoverflow.com/questions/16943939/elegantly-assigning-multiple-columns-in-data-table-with-lapply/16944343#16944343

Inside `[.data.table` expressions are handled without initial evaluation and sometimes `eval` needs to be called directly. Evaluation is deferred in many instances, but arguably closer to "computing on the language" than when using interactive R. The .SD object is the data.table self-reference mechanism so the `a+b` and the `a*b` are evaluated in the context of the column names of DT. And `:=` is a special data.table assignment function that avoids constructing multiple copies of data.table objects during column binding.


> If not a bug, do users of within have a workaround to produce a result
> with columns as ordered within 'within'? I can think of a way using
> names and subset-with-select, but that seems unduly kludgy.

data.table users would think your `within` approach was suspicious if not also kludgey. It would be an end run around the package's syntacical modifications and it may lose the space efficiencies that data.table offers.

-- 

David Winsemius
Alameda, CA, USA




More information about the R-help mailing list