[R] Best way to compute the difference between two levels of a factor ?
ehlers at ucalgary.ca
Wed Mar 21 12:01:31 CET 2012
On 2012-03-21 03:37, wphantomfr wrote:
> Thanks peter for your fast answer.
> your is really nice but if I have say 20 variables I have to write 20
> statements like "DIF.X = X[TIME=="T2"] - X[TIME=="T1"]".
> Does someone has a trick to avoid this ? It may not be easily possible.
Okay, try this:
result <- with(data,
aggregate(data[,-(1:2)], by=list(ID), FUN=diff))
This assumes that the dataframe is sorted as in your example. If
that's not the case, then use order to arrange it first:
data <- with(data, data[order(ID, TIME), ])
> Le 21/03/12 11:03, Peter Ehlers a écrit :
>> On 2012-03-21 01:48, wphantomfr wrote:
>>> Dear R-help Members,
>>> I am wondering if anyone think of the optimal way of computing for
>>> several numeric variable the difference between 2 levels of a factor.
>>> To be clear let's generate a simple data frame with 2 numeric variables
>>> collected for different subjects (ID) and 2 levels of a TIME factor
>>> (time of evaluation)
>>> ID TIME X Y
>>> 1 AA T1 9.959540 11.140529
>>> 2 AA T2 12.949522 9.896559
>>> 3 BB T1 9.039486 13.469104
>>> 4 BB T2 10.056392 14.632169
>>> 5 CC T1 8.706590 14.939197
>>> 6 CC T2 10.799296 10.747609
>>> I want to compute for each subject and each variable (X, Y, ...) the
>>> difference between T2 and T1.
>>> Until today I do it by reshaping my dataframe to the wide format (the
>>> columns are then ID, X.T1, X.T2, Y.T1,Y.T2) and then compute the
>>> difference between successive columns one by one :
>>> but this way is probably not optimal if the difference has to be
>>> computed for a large number of variables.
>>> How will you handle it ?
>> One way is to use the plyr package:
>> result<- ddply(data, "ID", summarize,
>> DIF.X = X[TIME=="T2"] - X[TIME=="T1"],
>> DIF.Y = Y[TIME=="T2"] - Y[TIME=="T1"])
>> Peter Ehlers
More information about the R-help