[Rd] oddity in transform

Ista Zahn i@t@z@hn @ending from gm@il@com
Tue Jul 24 17:47:47 CEST 2018


On Tue, Jul 24, 2018 at 11:41 AM, Ista Zahn <istazahn using gmail.com> wrote:
> I don't think it has much to do with transform in particular:
>
>> BOD <- data.frame(Time = 1:6, demand = runif(6))
>> BOD[["X"]] <- BOD[1:2] * seq(6); BOD
>   Time    demand X.Time  X.demand
> 1    1 0.8649628      1 0.8649628
> 2    2 0.5895380      4 1.1790761
> 3    3 0.6854635      9 2.0563906
> 4    4 0.4255801     16 1.7023206
> 5    5 0.5738793     25 2.8693967
> 6    6 0.9996713     36 5.9980281
>> BOD <- data.frame(Time = 1:6, demand = runif(6))
>> BOD[["X"]] <- BOD[1] * seq(6); BOD
>   Time     demand Time
> 1    1 0.72990231    1
> 2    2 0.61721422    4
> 3    3 0.02389160    9
> 4    4 0.28341746   16
> 5    5 0.06116124   25
> 6    6 0.67966577   36

Ugh, well, I see now that

BOD[["X"]] <- BOD[1:2] * seq(6); BOD

and

transform(BOD, X = BOD[1:2] * seq(6))

don't produce the same thing, despite printing in ways that look
similar. However,

data.frame(BOD, X = BOD[1:2] * seq(6))

and

data.frame(BOD, X = BOD[1] * seq(6))

do produce the same result as transform, so the point about this being
much more pervasive still holds.

--Ista



>
> --Ista
>
>
> On Tue, Jul 24, 2018 at 7:59 AM, Gabor Grothendieck
> <ggrothendieck using gmail.com> wrote:
>> The idea is that one wants to write the line of code below
>>  in a general way which works the same
>> whether you specify ix as one column or multiple columns but the naming entirely
>> changes when you do this and BOD[, 1] and transform(BOD, X=..., Y=...) or
>> other hard coding solutions still require writing multiple cases.
>>
>> ix <- 1:2
>> transform(BOD, X = BOD[ix] * seq(6))
>>
>>
>>
>> On Tue, Jul 24, 2018 at 7:14 AM, Emil Bode <emil.bode using dans.knaw.nl> wrote:
>>> I think you meant to call BOD[,1]
>>> From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is still a data.frame (with one column). So I don't think it's surprising transform gets confused by which name to use (X, or Time?), and kind of compromises on the name "Time". It's also in a note in ?transform: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!"
>>> And if you want to do it with multiple extra columns (and are not satisfied with these labels), I think the proper way to go would be " transform(BOD, X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))"
>>>
>>> If you want to trace it back further, it's not in transform but in data.frame. Column-names are prepended with a higher-level name if the object has more than one column.
>>> And it uses the tag-name if simply supplied with a vector:
>>> data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of BOD[1], Time. Only because that column name is already present, it's changed to Time.1
>>> data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is now a vector)
>>> data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns names X.Time and X.demand, to show these (multiple) columns are coming from X
>>>
>>> So I don't think there's much to fix here. I this case having X.Time in all cases would have been better, but in general the column-naming of data.frame works, changing it would likely cause a lot of problems.
>>> You can always change the column-names later.
>>>
>>> Best regards,
>>> Emil Bode
>>>
>>> Data-analyst
>>>
>>> +31 6 43 83 89 33
>>> emil.bode using dans.knaw.nl
>>>
>>> DANS: Netherlands Institute for Permanent Access to Digital Research Resources
>>> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | info using dans.knaw.nl <mailto:info using dans.kn> | dans.knaw.nl <applewebdata://71F677F0-6872-45F3-A6C4-4972BF87185B/www.dans.knaw.nl>
>>> DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and funding organisation NWO <http://www.nwo.nl/>.
>>>
>>> On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" <r-devel-bounces using r-project.org on behalf of ggrothendieck using gmail.com> wrote:
>>>
>>>     Note the inconsistency in the names in these two examples.  X.Time in
>>>     the first case and Time.1 in the second case.
>>>
>>>       > transform(BOD, X = BOD[1:2] * seq(6))
>>>         Time demand X.Time X.demand
>>>       1    1    8.3      1      8.3
>>>       2    2   10.3      4     20.6
>>>       3    3   19.0      9     57.0
>>>       4    4   16.0     16     64.0
>>>       5    5   15.6     25     78.0
>>>       6    7   19.8     42    118.8
>>>
>>>       > transform(BOD, X = BOD[1] * seq(6))
>>>         Time demand Time.1
>>>       1    1    8.3      1
>>>       2    2   10.3      4
>>>       3    3   19.0      9
>>>       4    4   16.0     16
>>>       5    5   15.6     25
>>>       6    7   19.8     42
>>>
>>>     --
>>>     Statistics & Software Consulting
>>>     GKX Group, GKX Associates Inc.
>>>     tel: 1-877-GKX-GROUP
>>>     email: ggrothendieck at gmail.com
>>>
>>>     ______________________________________________
>>>     R-devel using r-project.org mailing list
>>>     https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>
>>
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list