[R] Help me replace a for loop with an "apply" function

jim holtman jholtman at gmail.com
Thu Oct 1 23:05:00 CEST 2009


What I am doing is trying to determine where the dates are not
sequential (difference is not one day).  Everytime that this occurs,
the expression 'diff(.days) != 1' is TRUE and this is where a new
sequence starts.  'diff' will return a vector one shorter than its
input; I am assuming that the first date starts a sequence, so that is
why the TRUE is the initial entry.  Using 'cumsum' will generate a
vector that has the same values for dates that are consecutive.  By
using table, you can determine what the maximum number of consecutive
days are.

HTH

On Thu, Oct 1, 2009 at 2:57 PM, gd047 <gd047 at mineknowledge.com> wrote:
>
> Congratulations!
>
> Could you explain to me the reason you add an initial "TRUE" value in the
> cumulatice sum?
>
>
>
> jholtman wrote:
>>
>> Will this work:
>>
>>> x <- read.table(textConnection("   day         user_id
>> + 2008/11/01    2001
>> + 2008/11/01    2002
>> + 2008/11/01    2003
>> + 2008/11/01    2004
>> + 2008/11/01    2005
>> + 2008/11/02    2001
>> + 2008/11/02    2005
>> + 2008/11/03    2001
>> + 2008/11/03    2003
>> + 2008/11/03    2004
>> + 2008/11/03    2005
>> + 2008/11/04    2001
>> + 2008/11/04    2003
>> + 2008/11/04    2004
>> + 2008/11/04    2005"), header=TRUE)
>>> closeAllConnections()
>>> # convert to Date
>>> x$day <- as.Date(x$day, format="%Y/%m/%d")
>>> # split by user and then look for contiguous days
>>> contig <- sapply(split(x$day, x$user_id), function(.days){
>> +     .diff <- cumsum(c(TRUE, diff(.days) != 1))
>> +     max(table(.diff))
>> + })
>>> contig
>> 2001 2002 2003 2004 2005
>>    4    1    2    2    4
>>>
>>>
>>
>>
>> On Thu, Oct 1, 2009 at 11:29 AM, gd047 <gd047 at mineknowledge.com> wrote:
>>>
>>> ...if that is possible
>>>
>>> My task is to find the longest streak of continuous days a user
>>> participated
>>> in a game.
>>>
>>> Instead of writing an sql function, I chose to use the R's rle function,
>>> to
>>> get the longest streaks and then update my db table with the results.
>>>
>>> The (attached) dataframe is something like this:
>>>
>>>    day         user_id
>>> 2008/11/01    2001
>>> 2008/11/01    2002
>>> 2008/11/01    2003
>>> 2008/11/01    2004
>>> 2008/11/01    2005
>>> 2008/11/02    2001
>>> 2008/11/02    2005
>>> 2008/11/03    2001
>>> 2008/11/03    2003
>>> 2008/11/03    2004
>>> 2008/11/03    2005
>>> 2008/11/04    2001
>>> 2008/11/04    2003
>>> 2008/11/04    2004
>>> 2008/11/04    2005
>>>
>>>
>>>
>>> --- R code follows
>>> ------------------------------------------------------
>>>
>>>
>>> # turn it to a contingency table
>>> my_table <- table(user_id, day)
>>>
>>> # get the streaks
>>> rle_table <- apply(my_table,1,rle)
>>>
>>> # verify the longest streak of "1"s for user 2001
>>> # as.vector(tapply(rle_table$'2001'$lengths, rle_table$'2001'$values,
>>> max)["1"])
>>>
>>> # loop to get the results
>>> # initiate results matrix
>>> res<-matrix(nrow=dim(my_table)[1], ncol=2)
>>>
>>> for (i in 1:dim(my_table)[1]) {
>>> string <- paste("as.vector(tapply(rle_table$'", rownames(my_table)[i],
>>> "'$lengths, rle_table$'", rownames(my_table)[i], "'$values, max)['1'])",
>>> sep="")
>>> res[i,]<-c(as.integer(rownames(my_table)[i]) , eval(parse(text=string)))
>>> }
>>>
>>>
>>> ----------------------------------------------------
>>> --- end of R code
>>>
>>> Unfortunately this for loop takes too long and I' wondering if there is a
>>> way to produce the res matrix using a function from the "apply" family.
>>>
>>> Thank you in advance
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25696937.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Help-me-replace-a-for-loop-with-an-%22apply%22-function-tp25696937p25704683.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list