[R] subset only if f.e a column is successive for more than 3 values

Fri Sep 28 00:43:35 CEST 2018

Bugger! It's

eval(parse(text=paste0("kkdf[c(",paste(starts,ends,sep=":",collapse=","),"),]")))

What a mess!

Jim
On Fri, Sep 28, 2018 at 8:35 AM Jim Lemon <drjimlemon using gmail.com> wrote:
>
> Hi Knut,
> As Bert said, you can start with diff and work from there. I can
> easily get the text for the subset, but despite fooling around with
> "parse", "eval" and "expression", I couldn't get it to work:
>
> # use a bigger subset to test whether multiple runs can be extracted
> kkdf<-subset(airquality,Temp > 77,select=c("Ozone","Temp"))
> kkdf$index<-as.numeric(rownames(kkdf))
> # get the run length encoding
> seqindx<-rle(diff(kkdf$index)==1)
> # get a logical vector of the starts of the runs
> runsel<-seqindx$lengths >= 3 & seqindx$values
> # get the indices for the starts of the runs
> starts<-cumsum(seqindx$lengths)[runsel[-1]]+1
> # and the ends
> ends<-cumsum(seqindx$lengths)[runsel]+1
> # the character representation of the subset as indices is
> paste0("c(",paste(starts,ends,sep=":",collapse=","),")")
>
> I expect there will be a lightning response from someone who knows
> about converting the resulting string into whatever is needed.
>
> Jim
> On Fri, Sep 28, 2018 at 1:13 AM Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >
> > 1. I assume the values are integers, not floats/numerics (which woud make
> > it more complicated).
> >
> > 2. Strategy: Take differences (e.g. see ?diff) and look for >3 1's in a
> > row.
> >
> > I don't have time to work out details, but perhaps that helps.
> >
> > Cheers,
> > Bert
> >
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Thu, Sep 27, 2018 at 7:49 AM Knut Krueger <rhelp using krueger-family.de>
> > wrote:
> >
> > > Hi to all
> > >
> > > I need a subset for values if there are f.e 3 values successive in a
> > > column of a Data Frame:
> > > Example from the subset help page:
> > >
> > > subset(airquality, Temp > 80, select = c(Ozone, Temp))
> > > 29     45   81
> > > 35     NA   84
> > > 36     NA   85
> > > 38     29   82
> > > 39     NA   87
> > > 40     71   90
> > > 41     39   87
> > > 42     NA   93
> > > 43     NA   92
> > > 44     23   82
> > > .....
> > >
> > > I would like to get only
> > >
> > > ...
> > > 40     71   90
> > > 41     39   87
> > > 42     NA   93
> > > 43     NA   92
> > > 44     23   82
> > > ....
> > >
> > > because the left column is ascending more than f.e three times without gap
> > >
> > > Any hints for a package or do I need to build a own function?
> > >
> > > Kind Regards Knut
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.