[R] Peak Over Threshold values

William Dunlap wdunlap at tibco.com
Wed Jun 2 18:26:53 CEST 2010


> -----Original Message-----
> From: Tonja Krueger [mailto:tonja.krueger at web.de] 
> Sent: Monday, May 31, 2010 2:03 AM
> To: William Dunlap
> Cc: r-help at r-project.org
> Subject: RE: [R] Peak Over Threshold values
> 
> Thanks a lot for your help. That's the time period I was looking for. 
> 
> I've got one more question: for further analyses I need the 
> respective maximum values
> within these time periods (between the green and red lines). 
> Preferably in combination
> with the date the maximum event happened. 

Looping over the intervals found by f() (or f0()) is
simple to do and probably quick enough for you.  E.g.,
to get the position of the maximum of each interval use
something like

  positionOfMaxInEachInterval <- function (x, intervals) 
  {
      with(intervals, sapply(seq_along(start), function(i) start[i] + 
          which.max(x[start[i]:stop[i]]) - 1))
  }

The maxima themselve would be gotten by
  intervals <- f(x, startThreshold, stopThreshold, plot=TRUE)
  imax <- postionOfMaxInEachInterval(x, intervals)
  # abline(v=imax) if you had plot=TRUE above
  maxs <- x[imax]

By the way, the looping form of f() can be made much faster
than the f0 I showed earlier, in the case where there are
many intervals (e.g., f(sin(1:1e6),.1,0)), by preallocating
the output.  You may prefer to edit that version to get the
positions of the interval maxima as you compute the intervals.

f0a <-
function (x = walevel, startThreshhold = 5.45, stopThreshhold = 5.3, 
    plot = TRUE) 
{
    stopifnot(startThreshhold > stopThreshhold)
    inRun <- FALSE
    start <- integer(length(x))
    stop <- integer(length(x))
    nStart <- 0
    nStop <- 0
    for (i in seq_along(x)) {
        if (inRun) {
            if (x[i] < stopThreshhold) {
                nStop <- nStop + 1
                stop[nStop] <- i - 1L
                inRun <- FALSE
            }
        }
        else {
            if (x[i] > startThreshhold) {
                nStart <- nStart + 1
                start[nStart] <- i
                inRun <- TRUE
            }
        }
    }
    if (inRun) {
        nStop <- nStop + 1
        stop[nStop] <- length(x)
    }
    if (nStop > nStart) {
        stop <- stop[-1]
        nStop <- nStop - 1
    }
    length(stop) <- nStop
    length(start) <- nStart
    if (plot) {
        plot(x, cex = 0.5)
        abline(h = c(startThreshhold, stopThreshhold))
        abline(v = start, col = "green")
        abline(v = stop, col = "red")
    }
    data.frame(start = start, stop = stop)
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> Thank you
> in advance, 
> 
> Tonja
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: William Dunlap <wdunlap at tibco.com>
> Gesendet: 27.05.2010 22:13:21
> An: "Hutchinson,David [PYR]" 
> <David.Hutchinson at ec.gc.ca>,Tonja Krueger <tonja.krueger at web.de>
> Betreff: RE: [R] Peak Over Threshold values
> 
> >> -----Original Message-----
> >> From: r-help-bounces at r-project.org 
> >> [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
> >> Sent: Thursday, May 27, 2010 12:24 PM
> >> To: Hutchinson,David [PYR]; Tonja Krueger
> >> Cc: r-help at r-project.org
> >> Subject: Re: [R] Peak Over Threshold values
> >> 
> >> Here is another version, loopless, but still
> >> a bit clumsy (can the call to which be removed?):
> >
> >Note that avoiding loops is mostly for
> >the aesthetic effect.  On my aging laptop
> >the following loopy and vector-growing version
> >takes 3 seconds on a million-long vector
> >while the non-loopy one takes 0.22 seconds
> >(timings done with plot=FALSE).  That is
> >a nice ratio but not much of a difference.
> >
> >f0 <- function (x = walevel,
> >    startThreshhold = 5.45,
> >    stopThreshhold = 5.3, 
> >    plot = TRUE) 
> >{
> >    stopifnot(startThreshhold > stopThreshhold)
> >    inRun <- FALSE
> >    start <- integer()
> >    stop <- integer()
> >    for (i in seq_along(x)) {
> >        if (inRun) {
> >            if (x[i] < stopThreshhold) {
> >                stop[length(stop) + 1] <- i - 1
> >                inRun <- FALSE
> >            }
> >        }
> >        else {
> >            if (x[i] > startThreshhold) {
> >                start[length(start) + 1] <- i
> >                inRun <- TRUE
> >            }
> >        }
> >    }
> >    if (inRun) 
> >        stop[length(stop) + 1] <- length(x)
> >    if (length(stop) > length(start)) 
> >        stop <- stop[-1]
> >    if (plot) {
> >        plot(x, cex = 0.5)
> >        abline(h = c(startThreshhold, stopThreshhold))
> >        abline(v = start, col = "green")
> >        abline(v = stop, col = "red")
> >    }
> >    data.frame(start = start, stop = stop)
> >}
> >
> >> 
> >> f <- function (x = walevel,
> >>                startThreshhold = 5.45,
> >>                stopThreshhold = 5.3, 
> >>                plot = TRUE) 
> >> {
> >>     stopifnot(startThreshhold > stopThreshhold)
> >>     isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
> >>     isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
> >>     isOverStart <- x >= startThreshhold
> >>     isOverStop <- x >= stopThreshhold
> >>     possibleStartPt <- which(isFirstInRun(isOverStart) & 
> isOverStart)
> >>     possibleStopPt <- which(isLastInRun(isOverStop) & isOverStop)
> >>     pts <- c(possibleStartPt, possibleStopPt)
> >>     names(pts) <- rep(c("start", "stop"),
> >>         c(length(possibleStartPt), length(possibleStopPt)))
> >>     pts <- pts[order(pts)]
> >>     tmp <- isFirstInRun(names(pts))
> >>     start <- pts[tmp & names(pts) == "start"]
> >>     stop <- pts[tmp & names(pts) == "stop"]
> >>     if (length(stop) > length(start)) { 
> >>         # i.e., when first downcrossing happens before
> >>         # first upcrossing
> >>         stop <- stop[-1]
> >>     }
> >>     if (plot) {
> >>         plot(x, cex = 0.5)
> >>         abline(h = c(startThreshhold, stopThreshhold))
> >>         abline(v = start, col = "green")
> >>         abline(v = stop, col = "red")
> >>     }
> >>     data.frame(start = start, stop = stop)
> >> }
> >> 
> >> # define the data in the original message and call f().
> >> 
> >> The isFirstInRun and isLastInRun functions do the
> >> first part of the calculation that rle does and
> >> avoid the cumsum(diff()) roundtrip that
> >> cumsum(rle()$lengths) involves and their names
> >> make it clearer what I'm trying to do.
> >> 
> >> Bill Dunlap
> >> Spotfire, TIBCO Software
> >> wdunlap tibco.com  
> >> 
> >> > -----Original Message-----
> >> > From: r-help-bounces at r-project.org 
> >> > [mailto:r-help-bounces at r-project.org] On Behalf Of 
> >> > Hutchinson,David [PYR]
> >> > Sent: Thursday, May 27, 2010 10:41 AM
> >> > To: Tonja Krueger
> >> > Cc: r-help at r-project.org
> >> > Subject: Re: [R] Peak Over Threshold values
> >> > 
> >> > Perhaps not elegant, but does the job
> >> > 
> >> > threshold <- 5.45
> >> > meetThreshold <- walevel >= threshold
> >> > 
> >> > d <- rle(meetThreshold)
> >> > startPos <- c(1, 1 + cumsum(d$lengths[-length(d$lengths)]))
> >> > 
> >> > abv <- which(d$values == TRUE)
> >> > p.o.t <- data.frame()
> >> > 
> >> > for (i in seq( along = abv ) ) {
> >> >   a <- startPos[abv[i]]
> >> >   b <- a + (d$lengths[abv[i]] - 1)
> >> >   e <- which.max(walevel[a:b])
> >> >   p.o.t <- rbind( p.o.t, data.frame(
> >> >                pos = a + e - 1,
> >> >                val = walevel[a:b][e]
> >> >            ) )
> >> > }
> >> > 
> >> > plot (
> >> >   day,
> >> >   walevel, type = 'l'
> >> > )
> >> > 
> >> > points(
> >> >   p.o.t$pos,
> >> >   p.o.t$val,
> >> >   col = 'red'
> >> > )
> >> > 
> >> > abline(h = threshold, lty = 2, col = 'red')
> >> > 
> >> > -----Original Message-----
> >> > From: r-help-bounces at r-project.org 
> >> > [mailto:r-help-bounces at r-project.org] On Behalf Of Tonja Krueger
> >> > Sent: Thursday, May 27, 2010 1:47 AM
> >> > To: Vito Muggeo (UniPa); Clint Bowman
> >> > Cc: r-help at r-project.org
> >> > Subject: [R] Peak Over Threshold values
> >> > 
> >> > 
> >> >    I'm  sorry, but that's not exactly what I was looking for. 
> >> > I obviously
> >> >    didn't explain properly:
> >> > 
> >> >    Within  my dataframe (df) I would like to find POT values 
> >> > that are not
> >> >    linked. In my definition two maximum values are linked if 
> >> > walevel does not
> >> >    fall below a certain value (the lower threshold value).
> >> > 
> >> >    walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 
> >> > 5.86, 5.91, 5.91,
> >> >    5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 
> >> > 6.11, 6.14, 6.12,
> >> >    6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 
> >> > 5.72, 5.70, 5.65,
> >> >    5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 
> >> > 5.19, 5.19, 5.13,
> >> >    5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 
> >> > 5.22, 5.32, 5.29,
> >> >    5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 
> >> > 5.66, 5.68, 5.72,
> >> >    5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 
> >> > 5.62, 5.62, 5.61,
> >> >    5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 
> >> > 5.45, 5.47, 5.50,
> >> >    5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
> >> > 
> >> >    day <- c(1:100)
> >> > 
> >> >    df <- data.frame(day,walevel)
> >> > 
> >> >    For example in my dataframe  a threshold value = 5.45 and 
> >> > an lower threshold
> >> >    value = 5.3 should lead to two maximum values because 
> >> > between the 2^nd and
> >> >    3^rd peak walevel does not fall below the lower 
> threshold value.
> >> > 
> >> >    With  "clusters (...)" from "evd package", I could find 
> >> > out POT values but
> >> >    even though I set a lower threshold value for my example 
> >> > dataframe I would
> >> >    get three maximum values instead of two.
> >> > 
> >> >    library(evd)
> >> > 
> >> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T)
> >> > 
> >> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T)
> >> > 
> >> >    Changing r to 30 (for example) connects the 2^nd and 3^rd 
> >> > maximum events and
> >> >    gives out the right 'end' for the first extreme event but 
> >> > not for the second
> >> >    event. (Also I'd rather not use a time interval to prove 
> >> > that the events are
> >> >    linked)
> >> > 
> >> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T)
> >> > 
> >> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T)
> >> > 
> >> >    What went wrong???
> >> > 
> >> >    Tonja
> >> >    -----Ursprüngliche Nachricht-----
> >> >    How about?
> >> >    hi.rle<-rle(walevel>5.79)
> >> >    lo.rle<-rle(walevel<5.36)
> >> >    plot(walevel)
> >> >    abline(h=5.8,col=2,lty=3)
> >> >    abline(h=5.35,col=3,lty=3)
> >> >    
> hi.lo.rle<-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths)))
> >> >    abline(v=hi.lo.rle)
> >> >    You can use the $values from the rle to sort things 
> out. Probably
> >> >    want to ignore the end point at length(walevel).
> >> >    --
> >> >    Clint Bowman INTERNET: clint at ecy.wa.gov
> >> >    Air Quality Modeler INTERNET: clint at math.utah.edu
> >> >    Department of Ecology VOICE: (360) 407-6815
> >> >    PO Box 47600 FAX: (360) 407-7534
> >> >    Olympia, WA 98504-7600
> >> >    On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote:
> >> >    > dear Tonja,
> >> >    > By plotting your data
> >> >    >
> >> >    > plot(df)
> >> >    >
> >> >    > it seems to me that you are looking for a piecewise 
> >> > linear relationships.
> >> >    If
> >> >    >  this is the case, have a look to the package segmented. 
> >> > You have to
> >> >    specify
> >> >    > or not the number and the starting values for the 
> breakpoints
> >> >    >
> >> >    > library(segmented)
> >> >    > olm<-lm(walevel~day)
> >> >    >
> >> >    > #specify number and starting values for the breakpoints..
> >> >    > oseg<-segmented(olm, seg.Z=~day, psi=c(20,50,80))
> >> >    > plot(oseg,add=TRUE,col=2)
> >> >    > oseg$psi
> >> >    >
> >> >    > #or you can avoid to specify starting values for psi
> >> >    > oseg<-segmented(olm, seg.Z=~day,
> >> >    > psi=NA,control=seg.control(stop.if.error=FALSE,K=30))
> >> >    >
> >> >    > plot(oseg,add=TRUE,col=2)
> >> >    > oseg$psi
> >> >    >
> >> >    >
> >> >    > best,
> >> >    > vito
> >> >    >
> >> >    >
> >> >    > Tonja Krueger ha scritto:
> >> >    >> Dear List
> >> >    >>
> >> >    >> I hope you can help me: I've got a dataframe (df) 
> >> > within which I am
> >> >    >> looking
> >> >    >> for Peak Over Threshold values as well as the length of 
> >> > the events. An
> >> >    >> event
> >> >    >> starts when walevel equals 5.8 and it should end when 
> >> > walevel equals
> >> >    >> the
> >> >    >> lower threshold value (5.35).
> >> >    >>
> >> >    >> I tried "clusters (...)" from "evd package", and varied 
> >> > r (see example)
> >> >    >> but it
> >> >    >> did not work for all events (again see example).
> >> >    >>
> >> >    >> walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 
> >> > 5.86, 5.91,
> >> >    >> 5.91,
> >> >    >> 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 
> >> > 6.11, 6.14, 6.12,
> >> >    >> 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 
> >> > 5.72, 5.70, 5.65,
> >> >    >> 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 
> >> > 5.19, 5.19, 5.13,
> >> >    >> 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 
> >> > 5.22, 5.32, 5.29,
> >> >    >> 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 
> >> > 5.66, 5.68, 5.72,
> >> >    >> 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 
> >> > 5.62, 5.62, 5.61,
> >> >    >> 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 
> >> > 5.45, 5.47, 5.50,
> >> >    >> 5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
> >> >    >>
> >> >    >> day <- c(1:100)
> >> >    >>
> >> >    >> df <- data.frame(day,walevel)
> >> >    >>
> >> >    >> library(evd)
> >> >    >> clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax 
> >> > = T, plot = T)
> >> >    >> clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, 
> >> > cmax = T, plot = T)
> >> >    >>
> >> >    >> What have I done wrong?
> >> >    >>
> >> >    >> Tonja
> >> >    >> ______________________________________________
> >> >    >> R-help at r-project.org mailing list
> >> >    >> [1]https://stat.ethz.ch/mailman/listinfo/r-help
> >> >    >> PLEASE do read the posting guide
> >> >    >> [2]http://www.R-project.org/posting-guide.html
> >> >    >> and provide commented, minimal, self-contained, 
> >> > reproducible code.
> >> >    >>
> >> >    >
> >> >    >
> >> > 
> >> >    GRATIS: Movie-Flat mit über 300 Top-Videos. Für WEB.DE Nutzer
> >> >    dauerhaft kostenlos! Jetzt freischalten unter 
> >> > http://movieflat.web.de
> >> > 
> >> > References
> >> > 
> >> >    1. 
> >> > file://localhost/../jump.htm?goto=https%3A%2F%2Fstat.ethz.ch%2
> >> Fmailman%2Flistinfo%2Fr-help
> >> >    2. 
> >> > file://localhost/../jump.htm?goto=http%3A%2F%2Fwww.R-project.o
> >> rg%2Fposting-guide.html
> >> > ______________________________________________
> >> > R-help at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide 
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, 
> reproducible code.
> >> > 
> >> > ______________________________________________
> >> > R-help at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide 
> >> > http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, 
> reproducible code.
> >> > 
> >> 
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide 
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> ___________________________________________________________
> GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
> Jetzt freischalten unter http://movieflat.web.de
> 



More information about the R-help mailing list