[R] Cumulative split of value in data frame column

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Fri Jun 5 20:28:50 CEST 2020


This is a **plain text list **. In future please post in plain text so that
your post does not get mangled.

Anyway,...

I don't know about "efficient, optimized", but here's one simple way to do
it using ?strsplit to unsplit and then ?paste to recombine:

df <- data.frame(ID=1:3, FOO=c('A_B','A_B_C','A_B_C_D_E'))

cumsplit<- function(x,split = "_"){
    w <- x[1]
    for(i in seq_along(x)[-1])  w <- c(w, paste(w[i-1],x[i], sep = split))
    w
}

> lapply(strsplit(df$FOO, split = "_"), cumsplit)
[[1]]
[1] "A"   "A_B"

[[2]]
[1] "A"     "A_B"   "A_B_C"

[[3]]
[1] "A"         "A_B"       "A_B_C"     "A_B_C_D"   "A_B_C_D_E"

I wouldn't be surprised if clever use of regex's would be faster, but as I
said, this is simple.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jun 5, 2020 at 9:33 AM Ravi Jeyaraman <ravi76 using gmail.com> wrote:

> Assuming, I have a data frame like this ..
>
> df <- data.frame(ID=1:3, FOO=c('A_B','A_B_C','A_B_C_D_E'))
>
> I want to do a 'cumulative split' of the values in column FOO based on the
> delimiter '_'.  The end result should be like this ..
>
> ID  FOO         FOO_SPLIT1              FOO_SPLIT2      FOO_SPLIT3
> FOO_SPLIT4              FOO_SPLIT5
> 1   A_B         A                    A_B
> 2   A_B_C               A                       A_B
> A_B_C
> 3   A_B_C_D_E   A                    A_B                        A_B_C
> A_B_C_D         A_B_C_D_E
>
> Any efficient, optimized way to do this?
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list