# [R] Create sequential vector for values in another column

```At this point 3 functions have been suggested and I'll add a 4th:
f1 <- function(x)unlist(lapply(unname(split(rep.int(1L,length(x)), x)), cumsum))
f2 <- function(x)unlist(sapply(rle(x)\$lengths, function(k) 1:k ))
f3 <- function(x)ave(x,x,FUN=seq)
f4 <- function(x)ave(seq_along(x), x, FUN=seq_along)
You can compare their results with ftest (as long as their results have the
same lengths):
ftest <- function(x) {
data.frame(x, f1=f1(x), f2=f2(x), f3=f3(x), f4=f4(x))
}
They all return the same result for the Steven's sample data, which is numeric
and in sorted order:
x0 <- c(123.45, 123.45, 123.45, 123.45, 234.56,
234.56, 234.56, 234.56, 234.56, 234.56, 234.56, 345.67, 345.67,
345.67, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78, 456.78,
456.78, 456.78)
However, f1() gives the wrong answer if x is not sorted:
> ftest(c(30,30,30, 20,20))
x f1 f2 f3 f4
1 30  1  1  1  1
2 30  2  2  2  2
3 30  1  3  3  3
4 20  2  1  1  1
5 20  3  2  2  2

f1() and f2() give the wrong answer if the groups are split up in the data
> ftest(c(10,10, 8,8,8, 10,10,10)) # 10's not contiguous
x f1 f2 f3 f4
1 10  1  1  1  1
2 10  2  2  2  2
3  8  3  1  1  1
4  8  1  2  2  2
5  8  2  3  3  3
6 10  3  1  3  3
7 10  4  2  4  4
8 10  5  3  5  5
(It is not clear what result the OP wants here.)

f3() gives the wrong answer if x is not numeric
> f3(c("a","a","a", "b","b"))
[1] "1" "2" "3" "1" "2"

f3() also gives an ominous warning if there is singleton in x (be
> f3(c(1,1,1, 11))
[1] 1 2 3 1
Warning message:
In `split<-.default`(`*tmp*`, g, value = lapply(split(x, g), FUN)) :
number of items to replace is not a multiple of replacement length

f2() fails to give an answer if x is a factor
> f2(factor(c("x","y","z")))
Error in rle(x) : 'x' must be an atomic vector

I think f4 gives the correct result for all those cases.

I think all of the above call lapply(split()) at some point and that can use
a lot of memory when there are lots of unique values in x.  You can use
a sort-based algorithm to avoid that problem.

> Hello all -
>
> I have an example column in a dataFrame
>
> id.name
> 123.45
> 123.45
> 123.45
> 123.45
> 234.56
> 234.56
> 234.56
> 234.56
> 234.56
> 234.56
> 234.56
> 345.67
> 345.67
> 345.67
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> 456.78
> ...
> [truncated]
>
> And I'd like to create a second vector of sequential values (i.e., 1:N) for
> each unique id.name value.  In other words, I need
>
> id.name  x
> 123.45   1
> 123.45   2
> 123.45   3
> 123.45   4
> 234.56   1
> 234.56   2
> 234.56   3
> 234.56   4
> 234.56   5
> 234.56   6
> 234.56   7
> 345.67   1
> 345.67   2
> 345.67   3
> 456.78   1
> 456.78   2
> 456.78   3
> 456.78   4
> 456.78   5
> 456.78   6
> 456.78   7
> 456.78   8
> 456.78   9
>
> The number of unique id.name values is different; for some values, nrow()
> may be 42 and for others it may be 36, etc.
>
> The only way I could think of to do this is with two nested for loops.  I
> tried it but because this data set is so large (nrow = 112,679 with 2,161
> unique values of id.name), it took several hours to run.
>
> Is there an easier way to create this vector?  I'd appreciate your thoughts.
>
> Thanks -
>
> SR
> Steven H. Ranney
>
