[R] Decompose df1 into another df2 based on values in df1

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Thu May 27 02:28:14 CEST 2021

Thank you for the reprex. However your specification was too vague for me
to know exactly what your data are like, so I tried to assume the most
general possibility, with the consequence that I may be giving you an
answer to the wrong question. Hopefully, you can adjust as needed to get
what you want.

I need also warn you that I am nearly certain there are more elegant,
cleverer, faster ways to do this. I just used simple tools. So you may wish
to wait a bit to see whether others can improve on my attempt.

First of all, I assumed the "a2/a3" in S5 in d1 is a typo and it should be
"a2|a3". If it is is not a typo then substitute "\\||\\/" for "\\|" in the
strsplit function in the code that follows.
Secondly, I assumed that your identifiers, "a1" for example, could occur
more than 1 time in your data. If the only possibilities are 0 or 1 times,
then the code I provided --in particular the last sapply-- is too
complicated. A faster approach in that case might be to use R's outer()
function; I leave that as an exercise for you or someone else to help you
with if so.

Here is my code for your reprex:

getall<- function(x){
   ul <-unlist(strsplit(x,"\\|"))
   ul[ul != "w"]
allvals <- lapply(d1, getall)
uneeks <- sort(unique(unlist(allvals)))
sapply(allvals, function(x)table(factor(x, levels = uneeks)))

## which gives
> sapply(allvals, function(x)table(factor(x, levels = uneeks)))
   S1 S2 S3 S4 S5
a1  1  0  0  0  0
a2  1  0  1  0  1
a3  0  0  0  0  1
b1  1  1  1  0  0
b3  1  0  1  0  0
b4  0  0  1  1  0
c1  0  0  1  0  0
c2  0  1  0  0  0
c4  0  0  1  1  0


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Wed, May 26, 2021 at 2:18 PM Adrian Johnson <oriolebaltimore using gmail.com>

> Hello,
> I am trying to convert a df (given below as d1) into df2 (given below as
> res).
>  I tried using loops for each row. I cannot get it right.  Moreover the df
> is 250000 x 500 in dimension and I cannot get it to work.
> Could anyone help me here please.
> Thanks.
> Adrian.
> d1 <-
> structure(list(S1 = c("a1|a2", "b1|b3", "w"), S2 = c("w", "b1",
> "c2"), S3 = c("a2", "b3|b4|b1", "c1|c4"), S4 = c("w", "b4", "c4"
> ), S5 = c("a2/a3", "w", "w")), class = "data.frame", row.names = c("A",
> "B", "C"))
> res <-
> structure(list(S1 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L),
>     S2 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L), S3 = c(0L,
>     1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L), S4 = c(0L, 0L, 0L, 0L,
>     0L, 0L, 1L, 0L, 0L, 1L), S5 = c(0L, 1L, 1L, 0L, 0L, 0L, 0L,
>     0L, 0L, 0L)), class = "data.frame", row.names = c("a1", "a2",
> "a3", "b1", "b2", "b3", "b4", "c1", "c2", "c4"))
>         [[alternative HTML version deleted]]
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

More information about the R-help mailing list