[R] Finding combination of states

Richard O'Keefe r@oknz @end|ng |rom gm@||@com
Thu Sep 7 12:51:30 CEST 2023


The Data Colada blog has some articles about the groundhog package.
See particular https://datacolada.org/95
and especially https://datacolada.org/100

I now have three reasons for preferring to stick with the core library
packages as much as possible.
1) It's just better style to do more with less.
2) The core packages are much better documented.  I'm favourably
   impressed by R package documentation in general, but the core
   packages are spelled out in more detail in more books.  (Only
   the tidyverse comes close, and it doesn't come close.)
3) Packages outside the core break frighteningly fast.
   From Data Colada 100,
   "A paper published ... in *Nature: Scientific Data (.htm
<https://www.nature.com/articles/s41597-022-01143-6>)* attempted to
automatically re-execute 2335 R scripts posted as supporting materials for
published papers. After cleaning the scripts (installing necessary packages
and fixing paths to local files) only  44% of scripts run without
generating errors. So, *most* scripts did not run."

Now the issue here is strikingly reminiscent of expand.grid.
In fact it's so reminiscent of expand.grid that I wonder if expand.grid
could be used to solve whatever the *original* problem was.

paste2 <- function (x, y) as.vector(outer(x, y, paste(0))
states <- list(c("X"), c("A","B","C"), c("A","B","C"), c("A","B","C"),
c("Z"))
x <- states[[length(states)]]
for (i in (length(states)-1):1) x <- paste2(states[[i]], x)
x

The output is
 [1] "XAAAZ" "XBAAZ" "XCAAZ" "XABAZ" "XBBAZ" "XCBAZ" "XACAZ" "XBCAZ"
"XCCAZ"}
[10] "XAABZ" "XBABZ" "XCABZ" "XABBZ" "XBBBZ" "XCBBZ" "XACBZ" "XBCBZ" "XCCBZ"
[19] "XAACZ" "XBACZ" "XCACZ" "XABCZ" "XBBCZ" "XCBCZ" "XACCZ" "XBCCZ" "XCCCZ"

How close this comes to what you want is for you to decide;
what I've hoped to show is that core R has simple building blocks
that you can use *simply* to do this kind of thing.

In fact we can do better by eliminating outer() and using rep() to
reshape the state lists and a single invocation of paste0() to put
them together.  But it's probably not worth while.




On Tue, 5 Sept 2023 at 23:55, Bert Gunter <bgunter.4567 using gmail.com> wrote:

> Oh I liked that.
>
> I was actually thinking about something similar, but couldn't figure it
> out.  The idiom you showed is very clever imo and taught me something about
> regexes that I never properly understood.
>
> Bert
>
> On Tue, Sep 5, 2023, 01:04 Eric Berger <ericjberger using gmail.com> wrote:
>
> > Hi Bert,
> > I really liked your solution.
> > In the spirit of code golf, I wondered if there is a shorter way to do
> > the regular expression test.
> > Kudos to my coding buddy GPT-4 for the following:
> >
> > You can replace your statement
> >
> > out[-grep(paste(paste0(states,states),  collapse = "|"),out)]
> >
> > by
> >
> > out[-grep("(.)\\1",out)]
> >
> > Best,
> > Eric
> >
> > On Tue, Sep 5, 2023 at 3:08 AM Bert Gunter <bgunter.4567 using gmail.com>
> wrote:
> > >
> > > ... and just for fun, here is a non-string version (more appropriate
> for
> > complex state labels??):
> > >
> > > gvec <- function(ntimes, states, init, final, repeats = TRUE)
> > >    ## ntimes: integer, number of unique times
> > >    ## states: vector of unique states
> > >    ## init: initial state
> > >    ## final: final state
> > > {
> > >    out <- cbind(init,
> > >             as.matrix(expand.grid(rep(list(states),ntimes -2 ))),final)
> > >    if(!repeats)
> > >      out[ apply(out,1,\(x)all(x[-1] != x[-ntimes])), ]
> > >    else out
> > > }
> > >
> > > yielding:
> > >
> > >
> > > > gvec(4, letters[1:5], "b", "e", repeats = TRUE)
> > >       init Var1 Var2 final
> > >  [1,] "b"  "a"  "a"  "e"
> > >  [2,] "b"  "b"  "a"  "e"
> > >  [3,] "b"  "c"  "a"  "e"
> > >  [4,] "b"  "d"  "a"  "e"
> > >  [5,] "b"  "e"  "a"  "e"
> > >  [6,] "b"  "a"  "b"  "e"
> > >  [7,] "b"  "b"  "b"  "e"
> > >  [8,] "b"  "c"  "b"  "e"
> > >  [9,] "b"  "d"  "b"  "e"
> > > [10,] "b"  "e"  "b"  "e"
> > > [11,] "b"  "a"  "c"  "e"
> > > [12,] "b"  "b"  "c"  "e"
> > > [13,] "b"  "c"  "c"  "e"
> > > [14,] "b"  "d"  "c"  "e"
> > > [15,] "b"  "e"  "c"  "e"
> > > [16,] "b"  "a"  "d"  "e"
> > > [17,] "b"  "b"  "d"  "e"
> > > [18,] "b"  "c"  "d"  "e"
> > > [19,] "b"  "d"  "d"  "e"
> > > [20,] "b"  "e"  "d"  "e"
> > > [21,] "b"  "a"  "e"  "e"
> > > [22,] "b"  "b"  "e"  "e"
> > > [23,] "b"  "c"  "e"  "e"
> > > [24,] "b"  "d"  "e"  "e"
> > > [25,] "b"  "e"  "e"  "e"
> > > >
> > > > gvec(4, letters[1:5], "b", "e", repeats = FALSE)
> > >       init Var1 Var2 final
> > >  [1,] "b"  "c"  "a"  "e"
> > >  [2,] "b"  "d"  "a"  "e"
> > >  [3,] "b"  "e"  "a"  "e"
> > >  [4,] "b"  "a"  "b"  "e"
> > >  [5,] "b"  "c"  "b"  "e"
> > >  [6,] "b"  "d"  "b"  "e"
> > >  [7,] "b"  "e"  "b"  "e"
> > >  [8,] "b"  "a"  "c"  "e"
> > >  [9,] "b"  "d"  "c"  "e"
> > > [10,] "b"  "e"  "c"  "e"
> > > [11,] "b"  "a"  "d"  "e"
> > > [12,] "b"  "c"  "d"  "e"
> > > [13,] "b"  "e"  "d"  "e"
> > >
> > > :-)
> > >
> > > -- Bert
> > >
> > > On Mon, Sep 4, 2023 at 2:04 PM Bert Gunter <bgunter.4567 using gmail.com>
> > wrote:
> > >>
> > >> Well, if strings with repeats (as you defined them) are to be
> excluded,
> > I think it's simple just to use regular expressions to remove them.
> > >>
> > >> e.g.
> > >> g <- function(ntimes, states, init, final, repeats = TRUE)
> > >>    ## ntimes: integer, number of unique times
> > >>    ## states: vector of unique states
> > >>    ## init: initial state
> > >>    ## final: final state
> > >> {
> > >> out <- do.call(paste0,c(init,expand.grid(rep(list(states), ntimes-2)),
> > final))
> > >> if(!repeats)
> > >>    out[-grep(paste(paste0(states,states),  collapse = "|"),out)]
> > >> else out
> > >> }
> > >> So:
> > >>
> > >> > g(4, LETTERS[1:5], "B", "E", repeats = FALSE)
> > >>  [1] "BCAE" "BDAE" "BEAE" "BABE" "BCBE" "BDBE" "BEBE" "BACE"
> > >>  [9] "BDCE" "BECE" "BADE" "BCDE" "BEDE"
> > >>
> > >> Perhaps not the most efficient way to do this, of course.
> > >>
> > >> Cheers,
> > >> Bert
> > >>
> > >>
> > >> On Mon, Sep 4, 2023 at 12:57 PM Eric Berger <ericjberger using gmail.com>
> > wrote:
> > >>>
> > >>> My initial response was buggy and also used a deprecated function.
> > >>> Also, it seems possible that one may want to rule out any strings
> > where the same state appears consecutively.
> > >>> I say that such a string has a repeat.
> > >>>
> > >>> myExpand <- function(v, n) {
> > >>>   do.call(tidyr::expand_grid, replicate(n, v, simplify = FALSE))
> > >>> }
> > >>>
> > >>> no_repeat <- function(s) {
> > >>>   v <- unlist(strsplit(s, NULL))
> > >>>   sum(v[-1]==v[-length(v)]) == 0
> > >>> }
> > >>>
> > >>> f <- function(states, nsteps, first, last, rm_repeat=TRUE) {
> > >>>   if (nsteps < 3) stop("nsteps must be at least 3")
> > >>>     out <- paste(first,
> > >>>           myExpand(states, nsteps-2) |>
> > >>>             apply(MAR=1, \(x) paste(x, collapse="")),
> > >>>           last, sep="")
> > >>>     if (rm_repeat) {
> > >>>       ok <- sapply(out, no_repeat)
> > >>>       out <- out[ok]
> > >>>     }
> > >>>     out
> > >>> }
> > >>>
> > >>> f(LETTERS[1:5],4,"B","E")
> > >>>
> > >>> #  [1] "BABE" "BACE" "BADE" "BCAE" "BCBE" "BCDE" "BDAE" "BDBE" "BDCE"
> > "BEAE" "BEBE" "BECE" "BEDE"
> > >>>
> > >>> On Mon, Sep 4, 2023 at 10:33 PM Bert Gunter <bgunter.4567 using gmail.com>
> > wrote:
> > >>>>
> > >>>> Sorry, my last line should have read:
> > >>>>
> > >>>> If neither this nor any of the other suggestions is what is desired,
> > I think the OP will have to clarify his query.
> > >>>>
> > >>>> Bert
> > >>>>
> > >>>> On Mon, Sep 4, 2023 at 12:31 PM Bert Gunter <bgunter.4567 using gmail.com
> >
> > wrote:
> > >>>>>
> > >>>>> I think there may be some uncertainty here about what the OP
> > requested. My interpretation is:
> > >>>>>
> > >>>>> n different times
> > >>>>> k different states
> > >>>>> Any state can appear at any time in the vector of times and can be
> > repeated
> > >>>>> Initial and final states are given
> > >>>>>
> > >>>>> So modifying Tim's expand.grid() solution a bit yields:
> > >>>>>
> > >>>>> g <- function(ntimes, states, init, final){
> > >>>>>    ## ntimes: integer, number of unique times
> > >>>>>    ## states: vector of unique states
> > >>>>>    ## init: initial state
> > >>>>>    ## final: final state
> > >>>>> do.call(paste0,c(init,expand.grid(rep(list(states), ntimes-2)),
> > final))
> > >>>>> }
> > >>>>>
> > >>>>> e.g.
> > >>>>>
> > >>>>> > g(4, LETTERS[1:5], "B", "D")
> > >>>>>  [1] "BAAD" "BBAD" "BCAD" "BDAD" "BEAD" "BABD" "BBBD" "BCBD"
> > >>>>>  [9] "BDBD" "BEBD" "BACD" "BBCD" "BCCD" "BDCD" "BECD" "BADD"
> > >>>>> [17] "BBDD" "BCDD" "BDDD" "BEDD" "BAED" "BBED" "BCED" "BDED"
> > >>>>> [25] "BEED"
> > >>>>>
> > >>>>> If neither this nor any of the other suggestions is not what is
> > desired, I think the OP will have to clarify his query.
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Bert
> > >>>>>
> > >>>>> On Mon, Sep 4, 2023 at 9:25 AM Ebert,Timothy Aaron <tebert using ufl.edu
> >
> > wrote:
> > >>>>>>
> > >>>>>> Does this work for you?
> > >>>>>>
> > >>>>>> t0<-t1<-t2<-LETTERS[1:5]
> > >>>>>> al2<-expand.grid(t0, t1, t2)
> > >>>>>> al3<-paste(al2$Var1, al2$Var2, al2$Var3)
> > >>>>>> al4 <- gsub(" ", "", al3)
> > >>>>>> head(al3)
> > >>>>>>
> > >>>>>> Tim
> > >>>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Eric
> > Berger
> > >>>>>> Sent: Monday, September 4, 2023 10:17 AM
> > >>>>>> To: Christofer Bogaso <bogaso.christofer using gmail.com>
> > >>>>>> Cc: r-help <r-help using r-project.org>
> > >>>>>> Subject: Re: [R] Finding combination of states
> > >>>>>>
> > >>>>>> [External Email]
> > >>>>>>
> > >>>>>> The function purrr::cross() can help you with this. For example:
> > >>>>>>
> > >>>>>> f <- function(states, nsteps, first, last) {
> > >>>>>>    paste(first, unlist(lapply(purrr::cross(rep(list(v),nsteps-2)),
> > >>>>>> \(x) paste(unlist(x), collapse=""))), last, sep="") }
> > f(LETTERS[1:5], 3, "B", "E") [1] "BAE" "BBE" "BCE" "BDE" "BEE"
> > >>>>>>
> > >>>>>> HTH,
> > >>>>>> Eric
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, Sep 4, 2023 at 3:42 PM Christofer Bogaso <
> > bogaso.christofer using gmail.com> wrote:
> > >>>>>> >
> > >>>>>> > Let say I have 3 time points.as T0, T1, and T2.(number of such
> > time
> > >>>>>> > points can be arbitrary) In each time point, an object can be
> any
> > of 5
> > >>>>>> > states, A, B, C, D, E (number of such states can be arbitrary)
> > >>>>>> >
> > >>>>>> > I need to find all possible ways, how that object starting with
> > state
> > >>>>>> > B (say) at time T0, can be on state E (example) in time T2
> > >>>>>> >
> > >>>>>> > For example one possibility is BAE etc.
> > >>>>>> >
> > >>>>>> > Is there any function available with R, that can give me a
> vector
> > of
> > >>>>>> > such possibilities for arbitrary number of states, time, and
> for a
> > >>>>>> > given initial and final (desired) states?
> > >>>>>> >
> > >>>>>> > ANy pointer will be very appreciated.
> > >>>>>> >
> > >>>>>> > Thanks for your time.
> > >>>>>> >
> > >>>>>> > ______________________________________________
> > >>>>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> > >>>>>> > https://stat/
> > >>>>>> > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%
> > 40ufl.edu
> > >>>>>> >
> > %7C25cee5ce26a8423daaa508dbad51c402%7C0d4da0f84a314d76ace60a62331e1b84
> > >>>>>> >
> > %7C0%7C0%7C638294338934034595%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> > >>>>>> >
> > MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
> > >>>>>> > ta=TM4jGF39Gy3PH0T3nnQpT%2BLogkVxifv%2Fudv9hWPwbss%3D&reserved=0
> > >>>>>> > PLEASE do read the posting guide
> > >>>>>> > http://www.r/
> > >>>>>> > -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%
> 40ufl.edu
> > %7C25
> > >>>>>> >
> > cee5ce26a8423daaa508dbad51c402%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> > >>>>>> >
> > 7C0%7C638294338934034595%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> > >>>>>> >
> > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5n
> > >>>>>> > PTLmsz0lOz47t41u578t9oI0i7BOgIX53yx8CesLs%3D&reserved=0
> > >>>>>> > and provide commented, minimal, self-contained, reproducible
> code.
> > >>>>>>
> > >>>>>> ______________________________________________
> > >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>>> PLEASE do read the posting guide
> > http://www.r-project.org/posting-guide.html
> > >>>>>> and provide commented, minimal, self-contained, reproducible code.
> > >>>>>> ______________________________________________
> > >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>>>>> PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list