[R] Add new calculated column to data frame

arun smartpink111 at yahoo.com
Thu Aug 29 21:13:49 CEST 2013



Hi,
You could try this:
dat1<- read.table(text="
id  module    event       time                       time_on_task
1   sys         login         1373502892           80
2   task        add          1373502892           80
3   task        add          1373502972           23
4   sys         login         1373502892           80
5   list         delete       1373502995          901
6   list          view         1373503896          100
7   task        add          1373503996           NA
",sep="",header=TRUE,stringsAsFactors=FALSE)
 dat1$Categ<-as.character(factor(with(dat1,paste(module,event,sep="_")),levels=c("task_add","sys_login","list_delete","list_view"),labels=LETTERS[1:4]))


dat1
#  id module  event       time time_on_task Categ
#1  1    sys  login 1373502892           80     B
#2  2   task    add 1373502892           80     A
#3  3   task    add 1373502972           23     A
#4  4    sys  login 1373502892           80     B
#5  5   list delete 1373502995          901     C
#6  6   list   view 1373503896          100     D
#7  7   task    add 1373503996           NA     A
A.K.

________________________________
From: srecko joksimovic <sreckojoksimovic at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Cc: R help <R-help at r-project.org> 
Sent: Thursday, August 29, 2013 2:34 PM
Subject: Re: [R] Add new calculated column to data frame



Hi Arun,

There is one more question... you explained me how to use split(dat1,cumsum(dat1$action=="login")) in one of previous questions, and that is great.
Now, if I have something like this:

id  module    event       time                       time_on_task
1   sys         login         1373502892           80
2   task        add          1373502892           80

3   task        add          1373502972           23
4   sys         login         1373502892           80
5   list         delete       1373502995          901
6   list          view         1373503896          100
7   task        add          1373503996           NA
I know how to split at each "login" occurrence, and I know how to add new column with time differences. But, how to add new column "category" which will be calculated based on columns "module" and "even"? For example if module=task and event=add => category= A...

Srecko





On Thu, Aug 29, 2013 at 11:22 AM, arun <smartpink111 at yahoo.com> wrote:

Hi Srecko,
>No problem.
>Regards,
>Arun
>
>
>
>
>
>
>
>________________________________
>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Sent: Thursday, August 29, 2013 2:22 PM
>
>Subject: Re: [R] Add new calculated column to data frame
>
>
>
>Sorry... I should figure it out...
>
>thanks so much!
>Srecko
>
>
>
>On Thu, Aug 29, 2013 at 11:21 AM, arun <smartpink111 at yahoo.com> wrote:
>
>Hi,
>>The one you showed is:
>>
>>dat1$time_on_task<- c(diff(dat1$time),NA)
>>
>> dat1
>>#  id  event       time time_on_task
>>#1  1    add 1373502892           80
>>
>>#2  2    add 1373502972           23
>>#3  3 delete 1373502995          901
>>#4  4   view 1373503896          100
>>#5  5    add 1373503996           NA
>>
>>
>>
>>
>>________________________________
>>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>>
>>To: arun <smartpink111 at yahoo.com>
>>Cc: R help <r-help at r-project.org>
>>Sent: Thursday, August 29, 2013 2:15 PM
>>Subject: Re: [R] Add new calculated column to data frame
>>
>>
>>
>>
>>Thanks Arun,
>>
>>this is great. However, it should be just a little bit different:
>>
>>#  id  event       time time_on_task
>>#1  1    add 1373502892           80
>>#2  2    add 1373502972           23
>>#3  3 delete 1373502995           901
>>#4  4   view 1373503896          100
>>#5  5    add 1373503996          NA
>>
>>
>>When I calculate difference, I need to know how long each activity was. It is id2-id1 for the first activity...
>>
>>
>>
>>On Thu, Aug 29, 2013 at 11:03 AM, arun <smartpink111 at yahoo.com> wrote:
>>
>>
>>>
>>>Hi,
>>>Try:
>>>dat1<- read.table(text="
>>>id    event    time
>>>
>>>1    add      1373502892
>>>2    add      1373502972
>>>3    delete  1373502995
>>>4    view      1373503896
>>>5    add      1373503996
>>>",sep="",header=TRUE,stringsAsFactors=FALSE)
>>> dat1$time_on_task<- c(NA,diff(dat1$time))
>>> dat1
>>>#  id  event       time time_on_task
>>>#1  1    add 1373502892           NA
>>>#2  2    add 1373502972           80
>>>#3  3 delete 1373502995           23
>>>#4  4   view 1373503896          901
>>>#5  5    add 1373503996          100
>>>
>>>#Not sure whether this depends on the values of "event" or not..
>>>A.K.
>>>
>>>
>>>
>>>
>>>
>>>
>>>----- Original Message -----
>>>From: srecko joksimovic <sreckojoksimovic at gmail.com>
>>>To: R help <R-help at r-project.org>
>>>Cc:
>>>Sent: Thursday, August 29, 2013 1:52 PM
>>>Subject: [R] Add new calculated column to data frame
>>>
>>>Hi,
>>>
>>>I have a following data set:
>>>id    event    time (in sec)
>>>1     add      1373502892
>>>2     add      1373502972
>>>3     delete   1373502995
>>>4     view      1373503896
>>>5     add       1373503996
>>>...
>>>
>>>I'd like to add new column "time on task" which is time elapsed between two
>>>events (id2 - id1...). What would be the best approach to do that?
>>>
>>>Thanks,
>>>Srecko
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>>______________________________________________
>>>R-help at r-project.org mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>



More information about the R-help mailing list