[R] Finding combination of states

Ebert,Timothy Aaron tebert @end|ng |rom u||@edu
Thu Sep 7 15:17:35 CEST 2023


I like many packages. They give me working code that it would take a long time for me to write (and debug). In some cases I am fairly sure that the effort would be a full time job on top of my regular workload. The packages use less of my time.

I hate packages because of Richard's point #3. I have SAS code written in 1986 that still works without any modification or updating. R code may not work as of the next update unless all of the packages are also updated. The more packages I use, the more often I must fight with one or more of them to keep my code running. If the code is something I want to keep for a long time, then it pays in the long run to use the fewest packages possible.
Tim


-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Richard O'Keefe
Sent: Thursday, September 7, 2023 6:52 AM
To: Bert Gunter <bgunter.4567 using gmail.com>
Cc: r-help <r-help using r-project.org>
Subject: Re: [R] Finding combination of states

[External Email]

The Data Colada blog has some articles about the groundhog package.
See particular https://datacolada.org/95
and especially https://datacolada.org/100

I now have three reasons for preferring to stick with the core library packages as much as possible.
1) It's just better style to do more with less.
2) The core packages are much better documented.  I'm favourably
   impressed by R package documentation in general, but the core
   packages are spelled out in more detail in more books.  (Only
   the tidyverse comes close, and it doesn't come close.)
3) Packages outside the core break frighteningly fast.
   From Data Colada 100,
   "A paper published ... in *Nature: Scientific Data (.htm
<https://www.nature.com/articles/s41597-022-01143-6>)* attempted to automatically re-execute 2335 R scripts posted as supporting materials for published papers. After cleaning the scripts (installing necessary packages and fixing paths to local files) only  44% of scripts run without generating errors. So, *most* scripts did not run."

Now the issue here is strikingly reminiscent of expand.grid.
In fact it's so reminiscent of expand.grid that I wonder if expand.grid could be used to solve whatever the *original* problem was.

paste2 <- function (x, y) as.vector(outer(x, y, paste(0)) states <- list(c("X"), c("A","B","C"), c("A","B","C"), c("A","B","C"),
c("Z"))
x <- states[[length(states)]]
for (i in (length(states)-1):1) x <- paste2(states[[i]], x) x

The output is
 [1] "XAAAZ" "XBAAZ" "XCAAZ" "XABAZ" "XBBAZ" "XCBAZ" "XACAZ" "XBCAZ"
"XCCAZ"}
[10] "XAABZ" "XBABZ" "XCABZ" "XABBZ" "XBBBZ" "XCBBZ" "XACBZ" "XBCBZ" "XCCBZ"
[19] "XAACZ" "XBACZ" "XCACZ" "XABCZ" "XBBCZ" "XCBCZ" "XACCZ" "XBCCZ" "XCCCZ"

How close this comes to what you want is for you to decide; what I've hoped to show is that core R has simple building blocks that you can use *simply* to do this kind of thing.

In fact we can do better by eliminating outer() and using rep() to reshape the state lists and a single invocation of paste0() to put them together.  But it's probably not worth while.




On Tue, 5 Sept 2023 at 23:55, Bert Gunter <bgunter.4567 using gmail.com> wrote:

> Oh I liked that.
>
> I was actually thinking about something similar, but couldn't figure
> it out.  The idiom you showed is very clever imo and taught me
> something about regexes that I never properly understood.
>
> Bert
>
> On Tue, Sep 5, 2023, 01:04 Eric Berger <ericjberger using gmail.com> wrote:
>
> > Hi Bert,
> > I really liked your solution.
> > In the spirit of code golf, I wondered if there is a shorter way to
> > do the regular expression test.
> > Kudos to my coding buddy GPT-4 for the following:
> >
> > You can replace your statement
> >
> > out[-grep(paste(paste0(states,states),  collapse = "|"),out)]
> >
> > by
> >
> > out[-grep("(.)\\1",out)]
> >
> > Best,
> > Eric
> >
> > On Tue, Sep 5, 2023 at 3:08 AM Bert Gunter <bgunter.4567 using gmail.com>
> wrote:
> > >
> > > ... and just for fun, here is a non-string version (more
> > > appropriate
> for
> > complex state labels??):
> > >
> > > gvec <- function(ntimes, states, init, final, repeats = TRUE)
> > >    ## ntimes: integer, number of unique times
> > >    ## states: vector of unique states
> > >    ## init: initial state
> > >    ## final: final state
> > > {
> > >    out <- cbind(init,
> > >             as.matrix(expand.grid(rep(list(states),ntimes -2 ))),final)
> > >    if(!repeats)
> > >      out[ apply(out,1,\(x)all(x[-1] != x[-ntimes])), ]
> > >    else out
> > > }
> > >
> > > yielding:
> > >
> > >
> > > > gvec(4, letters[1:5], "b", "e", repeats = TRUE)
> > >       init Var1 Var2 final
> > >  [1,] "b"  "a"  "a"  "e"
> > >  [2,] "b"  "b"  "a"  "e"
> > >  [3,] "b"  "c"  "a"  "e"
> > >  [4,] "b"  "d"  "a"  "e"
> > >  [5,] "b"  "e"  "a"  "e"
> > >  [6,] "b"  "a"  "b"  "e"
> > >  [7,] "b"  "b"  "b"  "e"
> > >  [8,] "b"  "c"  "b"  "e"
> > >  [9,] "b"  "d"  "b"  "e"
> > > [10,] "b"  "e"  "b"  "e"
> > > [11,] "b"  "a"  "c"  "e"
> > > [12,] "b"  "b"  "c"  "e"
> > > [13,] "b"  "c"  "c"  "e"
> > > [14,] "b"  "d"  "c"  "e"
> > > [15,] "b"  "e"  "c"  "e"
> > > [16,] "b"  "a"  "d"  "e"
> > > [17,] "b"  "b"  "d"  "e"
> > > [18,] "b"  "c"  "d"  "e"
> > > [19,] "b"  "d"  "d"  "e"
> > > [20,] "b"  "e"  "d"  "e"
> > > [21,] "b"  "a"  "e"  "e"
> > > [22,] "b"  "b"  "e"  "e"
> > > [23,] "b"  "c"  "e"  "e"
> > > [24,] "b"  "d"  "e"  "e"
> > > [25,] "b"  "e"  "e"  "e"
> > > >
> > > > gvec(4, letters[1:5], "b", "e", repeats = FALSE)
> > >       init Var1 Var2 final
> > >  [1,] "b"  "c"  "a"  "e"
> > >  [2,] "b"  "d"  "a"  "e"
> > >  [3,] "b"  "e"  "a"  "e"
> > >  [4,] "b"  "a"  "b"  "e"
> > >  [5,] "b"  "c"  "b"  "e"
> > >  [6,] "b"  "d"  "b"  "e"
> > >  [7,] "b"  "e"  "b"  "e"
> > >  [8,] "b"  "a"  "c"  "e"
> > >  [9,] "b"  "d"  "c"  "e"
> > > [10,] "b"  "e"  "c"  "e"
> > > [11,] "b"  "a"  "d"  "e"
> > > [12,] "b"  "c"  "d"  "e"
> > > [13,] "b"  "e"  "d"  "e"
> > >
> > > :-)
> > >
> > > -- Bert
> > >
> > > On Mon, Sep 4, 2023 at 2:04 PM Bert Gunter
> > > <bgunter.4567 using gmail.com>
> > wrote:
> > >>
> > >> Well, if strings with repeats (as you defined them) are to be
> excluded,
> > I think it's simple just to use regular expressions to remove them.
> > >>
> > >> e.g.
> > >> g <- function(ntimes, states, init, final, repeats = TRUE)
> > >>    ## ntimes: integer, number of unique times
> > >>    ## states: vector of unique states
> > >>    ## init: initial state
> > >>    ## final: final state
> > >> {
> > >> out <- do.call(paste0,c(init,expand.grid(rep(list(states),
> > >> ntimes-2)),
> > final))
> > >> if(!repeats)
> > >>    out[-grep(paste(paste0(states,states),  collapse = "|"),out)]
> > >> else out }
> > >> So:
> > >>
> > >> > g(4, LETTERS[1:5], "B", "E", repeats = FALSE)
> > >>  [1] "BCAE" "BDAE" "BEAE" "BABE" "BCBE" "BDBE" "BEBE" "BACE"
> > >>  [9] "BDCE" "BECE" "BADE" "BCDE" "BEDE"
> > >>
> > >> Perhaps not the most efficient way to do this, of course.
> > >>
> > >> Cheers,
> > >> Bert
> > >>
> > >>
> > >> On Mon, Sep 4, 2023 at 12:57 PM Eric Berger
> > >> <ericjberger using gmail.com>
> > wrote:
> > >>>
> > >>> My initial response was buggy and also used a deprecated function.
> > >>> Also, it seems possible that one may want to rule out any
> > >>> strings
> > where the same state appears consecutively.
> > >>> I say that such a string has a repeat.
> > >>>
> > >>> myExpand <- function(v, n) {
> > >>>   do.call(tidyr::expand_grid, replicate(n, v, simplify = FALSE))
> > >>> }
> > >>>
> > >>> no_repeat <- function(s) {
> > >>>   v <- unlist(strsplit(s, NULL))
> > >>>   sum(v[-1]==v[-length(v)]) == 0 }
> > >>>
> > >>> f <- function(states, nsteps, first, last, rm_repeat=TRUE) {
> > >>>   if (nsteps < 3) stop("nsteps must be at least 3")
> > >>>     out <- paste(first,
> > >>>           myExpand(states, nsteps-2) |>
> > >>>             apply(MAR=1, \(x) paste(x, collapse="")),
> > >>>           last, sep="")
> > >>>     if (rm_repeat) {
> > >>>       ok <- sapply(out, no_repeat)
> > >>>       out <- out[ok]
> > >>>     }
> > >>>     out
> > >>> }
> > >>>
> > >>> f(LETTERS[1:5],4,"B","E")
> > >>>
> > >>> #  [1] "BABE" "BACE" "BADE" "BCAE" "BCBE" "BCDE" "BDAE" "BDBE" "BDCE"
> > "BEAE" "BEBE" "BECE" "BEDE"
> > >>>
> > >>> On Mon, Sep 4, 2023 at 10:33 PM Bert Gunter
> > >>> <bgunter.4567 using gmail.com>
> > wrote:
> > >>>>
> > >>>> Sorry, my last line should have read:
> > >>>>
> > >>>> If neither this nor any of the other suggestions is what is
> > >>>> desired,
> > I think the OP will have to clarify his query.
> > >>>>
> > >>>> Bert
> > >>>>
> > >>>> On Mon, Sep 4, 2023 at 12:31 PM Bert Gunter
> > >>>> <bgunter.4567 using gmail.com
> >
> > wrote:
> > >>>>>
> > >>>>> I think there may be some uncertainty here about what the OP
> > requested. My interpretation is:
> > >>>>>
> > >>>>> n different times
> > >>>>> k different states
> > >>>>> Any state can appear at any time in the vector of times and
> > >>>>> can be
> > repeated
> > >>>>> Initial and final states are given
> > >>>>>
> > >>>>> So modifying Tim's expand.grid() solution a bit yields:
> > >>>>>
> > >>>>> g <- function(ntimes, states, init, final){
> > >>>>>    ## ntimes: integer, number of unique times
> > >>>>>    ## states: vector of unique states
> > >>>>>    ## init: initial state
> > >>>>>    ## final: final state
> > >>>>> do.call(paste0,c(init,expand.grid(rep(list(states),
> > >>>>> ntimes-2)),
> > final))
> > >>>>> }
> > >>>>>
> > >>>>> e.g.
> > >>>>>
> > >>>>> > g(4, LETTERS[1:5], "B", "D")
> > >>>>>  [1] "BAAD" "BBAD" "BCAD" "BDAD" "BEAD" "BABD" "BBBD" "BCBD"
> > >>>>>  [9] "BDBD" "BEBD" "BACD" "BBCD" "BCCD" "BDCD" "BECD" "BADD"
> > >>>>> [17] "BBDD" "BCDD" "BDDD" "BEDD" "BAED" "BBED" "BCED" "BDED"
> > >>>>> [25] "BEED"
> > >>>>>
> > >>>>> If neither this nor any of the other suggestions is not what
> > >>>>> is
> > desired, I think the OP will have to clarify his query.
> > >>>>>
> > >>>>> Cheers,
> > >>>>> Bert
> > >>>>>
> > >>>>> On Mon, Sep 4, 2023 at 9:25 AM Ebert,Timothy Aaron
> > >>>>> <tebert using ufl.edu
> >
> > wrote:
> > >>>>>>
> > >>>>>> Does this work for you?
> > >>>>>>
> > >>>>>> t0<-t1<-t2<-LETTERS[1:5]
> > >>>>>> al2<-expand.grid(t0, t1, t2)
> > >>>>>> al3<-paste(al2$Var1, al2$Var2, al2$Var3)
> > >>>>>> al4 <- gsub(" ", "", al3)
> > >>>>>> head(al3)
> > >>>>>>
> > >>>>>> Tim
> > >>>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: R-help <r-help-bounces using r-project.org> On Behalf Of Eric
> > Berger
> > >>>>>> Sent: Monday, September 4, 2023 10:17 AM
> > >>>>>> To: Christofer Bogaso <bogaso.christofer using gmail.com>
> > >>>>>> Cc: r-help <r-help using r-project.org>
> > >>>>>> Subject: Re: [R] Finding combination of states
> > >>>>>>
> > >>>>>> [External Email]
> > >>>>>>
> > >>>>>> The function purrr::cross() can help you with this. For example:
> > >>>>>>
> > >>>>>> f <- function(states, nsteps, first, last) {
> > >>>>>>    paste(first,
> > >>>>>> unlist(lapply(purrr::cross(rep(list(v),nsteps-2)),
> > >>>>>> \(x) paste(unlist(x), collapse=""))), last, sep="") }
> > f(LETTERS[1:5], 3, "B", "E") [1] "BAE" "BBE" "BCE" "BDE" "BEE"
> > >>>>>>
> > >>>>>> HTH,
> > >>>>>> Eric
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, Sep 4, 2023 at 3:42 PM Christofer Bogaso <
> > bogaso.christofer using gmail.com> wrote:
> > >>>>>> >
> > >>>>>> > Let say I have 3 time points.as T0, T1, and T2.(number of
> > >>>>>> > such
> > time
> > >>>>>> > points can be arbitrary) In each time point, an object can
> > >>>>>> > be
> any
> > of 5
> > >>>>>> > states, A, B, C, D, E (number of such states can be
> > >>>>>> > arbitrary)
> > >>>>>> >
> > >>>>>> > I need to find all possible ways, how that object starting
> > >>>>>> > with
> > state
> > >>>>>> > B (say) at time T0, can be on state E (example) in time T2
> > >>>>>> >
> > >>>>>> > For example one possibility is BAE etc.
> > >>>>>> >
> > >>>>>> > Is there any function available with R, that can give me a
> vector
> > of
> > >>>>>> > such possibilities for arbitrary number of states, time,
> > >>>>>> > and
> for a
> > >>>>>> > given initial and final (desired) states?
> > >>>>>> >
> > >>>>>> > ANy pointer will be very appreciated.
> > >>>>>> >
> > >>>>>> > Thanks for your time.
> > >>>>>> >
> > >>>>>> > ______________________________________________
> > >>>>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and
> > >>>>>> > more,
> see
> > >>>>>> > https://stat/
> > >>>>>> > .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Cteber
> > >>>>>> > t%
> > 40ufl.edu
> > >>>>>> >
> > %7C25cee5ce26a8423daaa508dbad51c402%7C0d4da0f84a314d76ace60a62331e1b
> > 84
> > >>>>>> >
> > %7C0%7C0%7C638294338934034595%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> > Aw
> > >>>>>> >
> > MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&s
> > da
> > >>>>>> > ta=TM4jGF39Gy3PH0T3nnQpT%2BLogkVxifv%2Fudv9hWPwbss%3D&reser
> > >>>>>> > ved=0 PLEASE do read the posting guide
> > >>>>>> > https://nam10.safelinks.protection.outlook.com/?url=http%3A
> > >>>>>> > %2F%2Fwww.r%2F&data=05%7C01%7Ctebert%40ufl.edu%7Cb525e189d1
> > >>>>>> > 8b4222c4bc08dbaf907920%7C0d4da0f84a314d76ace60a62331e1b84%7
> > >>>>>> > C0%7C0%7C638296807279447139%7CUnknown%7CTWFpbGZsb3d8eyJWIjo
> > >>>>>> > iMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D
> > >>>>>> > %7C3000%7C%7C%7C&sdata=kx35qboj5UcArmF70eHsapv4YR3oRLmDitHD
> > >>>>>> > wTz%2B%2BHk%3D&reserved=0
> > >>>>>> > -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%
> 40ufl.edu
> > %7C25
> > >>>>>> >
> > cee5ce26a8423daaa508dbad51c402%7C0d4da0f84a314d76ace60a62331e1b84%7C
> > 0%
> > >>>>>> >
> > 7C0%7C638294338934034595%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDA
> > iL
> > >>>>>> >
> > CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=
> > 5n
> > >>>>>> > PTLmsz0lOz47t41u578t9oI0i7BOgIX53yx8CesLs%3D&reserved=0
> > >>>>>> > and provide commented, minimal, self-contained,
> > >>>>>> > reproducible
> code.
> > >>>>>>
> > >>>>>> ______________________________________________
> > >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> > >>>>>> see
> > >>>>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%25
> > >>>>>> 2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%
> > >>>>>> 7Ctebert%40ufl.edu%7Cb525e189d18b4222c4bc08dbaf907920%7C0d4da
> > >>>>>> 0f84a314d76ace60a62331e1b84%7C0%7C0%7C638296807279447139%7CUn
> > >>>>>> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBT
> > >>>>>> iI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3Jy6G3fM1OHPk
> > >>>>>> cyRrM9%2FJD2%2B%2BME5hLRUpZbLcsfnrjo%3D&reserved=0
> > >>>>>> PLEASE do read the posting guide
> > http://www/
> > .r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%
> > 7Cb525e189d18b4222c4bc08dbaf907920%7C0d4da0f84a314d76ace60a62331e1b8
> > 4%7C0%7C0%7C638296807279447139%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> > jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7
> > C&sdata=Vv4SsrYNhL8YFuckZXfZK2gJMxm5PetrGInbS8Rgjzs%3D&reserved=0
> > >>>>>> and provide commented, minimal, self-contained, reproducible code.
> > >>>>>> ______________________________________________
> > >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> > >>>>>> see
> > >>>>>> https://nam10.safelinks.protection.outlook.com/?url=https%3A%25
> > >>>>>> 2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%
> > >>>>>> 7Ctebert%40ufl.edu%7Cb525e189d18b4222c4bc08dbaf907920%7C0d4da
> > >>>>>> 0f84a314d76ace60a62331e1b84%7C0%7C0%7C638296807279603404%7CUn
> > >>>>>> known%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBT
> > >>>>>> iI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FvBLxyUW%2B
> > >>>>>> GctuPsHflb0TInwlEjWjO4cBNd8GVoHd%2FI%3D&reserved=0
> > >>>>>> PLEASE do read the posting guide
> > http://www/
> > .r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%
> > 7Cb525e189d18b4222c4bc08dbaf907920%7C0d4da0f84a314d76ace60a62331e1b8
> > 4%7C0%7C0%7C638296807279603404%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> > jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7
> > C&sdata=UuSPZwNpsFagKtq26LaNqt%2F6KWGuyeQ1TD%2FOI%2B%2BCZ7U%3D&reser
> > ved=0
> > >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat/
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu
> %7Cb525e189d18b4222c4bc08dbaf907920%7C0d4da0f84a314d76ace60a62331e1b84
> %7C0%7C0%7C638296807279603404%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda
> ta=%2FvBLxyUW%2BGctuPsHflb0TInwlEjWjO4cBNd8GVoHd%2FI%3D&reserved=0
> PLEASE do read the posting guide
> http://www.r/
> -project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7Cb5
> 25e189d18b4222c4bc08dbaf907920%7C0d4da0f84a314d76ace60a62331e1b84%7C0%
> 7C0%7C638296807279603404%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Uu
> SPZwNpsFagKtq26LaNqt%2F6KWGuyeQ1TD%2FOI%2B%2BCZ7U%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list