[R] Add sequence numbers to lines with the same ID: How can this be accomplished?

Bert Gunter bgunter.4567 at gmail.com
Sun Oct 25 14:42:05 CET 2015


Yay Chuck!  Boo Bert.

-- Bert


Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Sat, Oct 24, 2015 at 9:05 PM, Charles C. Berry <ccberry at ucsd.edu> wrote:
> On Sat, 24 Oct 2015, Bert Gunter wrote:
>
>> Rolf's solution works for the situation where all duplicated values
>> are contiguous, which may be what you need. However, I wondered how it
>> could be done if this were not the case. Below is an answer. It is not
>> as efficient or elegant as Rolf's solution for the contiguous case I
>> think; maybe someone will come up with something better.
>
>
> The often underappreciated `ave' comes to mind. viz.,
>
>         ave(w,w,FUN=seq_along)
> and
>         ave(ID,ID,FUN=seq_along)
>
> agree with the results below.
>
> Of course, ave(...) is just split/unsplit in guise, further our discussion
> of a month or two back.
>
> Best,
>
> Chuck
>
>
>> But I think
>> it works. Here's an example with code:
>>
>>> w <- c(1:5,3,1,2,7,8,5,5,5,2,3)
>>> w
>>
>> [1] 1 2 3 4 5 3 1 2 7 8 5 5 5 2 3
>>>
>>> d <- 0+duplicated(w)
>>> for(x in unique(w)){
>>
>> +   i <- w==x
>> +   d[i]<-1+ cumsum(d[i])
>> +
>> + }
>>>
>>> d
>>
>> [1] 1 1 1 1 1 2 2 2 1 1 2 3 4 3 3
>>
>> As always, corrections and/or improvements welcome.
>>
>> Cheers,
>> Bert
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge
>> is certainly not wisdom."
>>   -- Clifford Stoll
>>
>>
>> On Sat, Oct 24, 2015 at 4:02 PM, Rolf Turner <r.turner at auckland.ac.nz>
>> wrote:
>>>
>>> On 25/10/15 11:28, John Sorkin wrote:
>>>>
>>>>
>>>> I have a file that has (1) Line numbers, (2) IDs. A given ID number can
>>>> appear in more than one row. For each row with a repeated ID, I want to
>>>> add
>>>> a number that gives the sequence number of the repeated ID number. The R
>>>> code below demonstrates what I want to have, without any attempt to
>>>> produce
>>>> the result, as I have no idea how to accomplish my goal.
>>>>
>>>>
>>>> line <- c(1,2,3,4,5,6,7,8,9,10)
>>>> ID<-    c(1,1,2,3,4,5,6,7,8,8)
>>>> cat("Note lines 1 and 2 both contain ID 1; lines 9 and 10 both contain
>>>> ID
>>>> 8")
>>>> cbind(line,ID)
>>>> Seq <-  c(1,2,1,1,1,1,1,1,1,2)
>>>> cat("Sequence numbers within ID added to the data")
>>>> cbind(line,ID,Seq)
>>>
>>>
>>>
>>> I *think* that
>>>
>>>   unlist(lapply(rle(ID)$lengths,seq_len))
>>>
>>> gives what you want.  At least it does for the given example.
>>>
>>> cheers,
>>>
>>> Rolf Turner
>>>
>>> --
>>> Technical Editor ANZJS
>>> Department of Statistics
>>> University of Auckland
>>> Phone: +64-9-373-7599 ext. 88276
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> Charles C. Berry                 Dept of Family Medicine & Public Health
> cberry at ucsd edu               UC San Diego / La Jolla, CA 92093-0901
> http://famprevmed.ucsd.edu/faculty/cberry/



More information about the R-help mailing list