[R] Peak Over Threshold values

William Dunlap wdunlap at tibco.com
Thu May 27 21:23:44 CEST 2010


Here is another version, loopless, but still
a bit clumsy (can the call to which be removed?):

f <- function (x = walevel,
               startThreshhold = 5.45,
               stopThreshhold = 5.3, 
               plot = TRUE) 
{
    stopifnot(startThreshhold > stopThreshhold)
    isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
    isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
    isOverStart <- x >= startThreshhold
    isOverStop <- x >= stopThreshhold
    possibleStartPt <- which(isFirstInRun(isOverStart) & isOverStart)
    possibleStopPt <- which(isLastInRun(isOverStop) & isOverStop)
    pts <- c(possibleStartPt, possibleStopPt)
    names(pts) <- rep(c("start", "stop"),
        c(length(possibleStartPt), length(possibleStopPt)))
    pts <- pts[order(pts)]
    tmp <- isFirstInRun(names(pts))
    start <- pts[tmp & names(pts) == "start"]
    stop <- pts[tmp & names(pts) == "stop"]
    if (length(stop) > length(start)) { 
        # i.e., when first downcrossing happens before
        # first upcrossing
        stop <- stop[-1]
    }
    if (plot) {
        plot(x, cex = 0.5)
        abline(h = c(startThreshhold, stopThreshhold))
        abline(v = start, col = "green")
        abline(v = stop, col = "red")
    }
    data.frame(start = start, stop = stop)
}

# define the data in the original message and call f().

The isFirstInRun and isLastInRun functions do the
first part of the calculation that rle does and
avoid the cumsum(diff()) roundtrip that
cumsum(rle()$lengths) involves and their names
make it clearer what I'm trying to do.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of 
> Hutchinson,David [PYR]
> Sent: Thursday, May 27, 2010 10:41 AM
> To: Tonja Krueger
> Cc: r-help at r-project.org
> Subject: Re: [R] Peak Over Threshold values
> 
> Perhaps not elegant, but does the job
> 
> threshold <- 5.45
> meetThreshold <- walevel >= threshold
> 
> d <- rle(meetThreshold)
> startPos <- c(1, 1 + cumsum(d$lengths[-length(d$lengths)]))
> 
> abv <- which(d$values == TRUE)
> p.o.t <- data.frame()
> 
> for (i in seq( along = abv ) ) {
>   a <- startPos[abv[i]]
>   b <- a + (d$lengths[abv[i]] - 1)
>   e <- which.max(walevel[a:b])
>   p.o.t <- rbind( p.o.t, data.frame(
>                pos = a + e - 1,
>                val = walevel[a:b][e]
>            ) )
> }
> 
> plot (
>   day,
>   walevel, type = 'l'
> )
> 
> points(
>   p.o.t$pos,
>   p.o.t$val,
>   col = 'red'
> )
> 
> abline(h = threshold, lty = 2, col = 'red')
> 
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Tonja Krueger
> Sent: Thursday, May 27, 2010 1:47 AM
> To: Vito Muggeo (UniPa); Clint Bowman
> Cc: r-help at r-project.org
> Subject: [R] Peak Over Threshold values
> 
> 
>    I'm  sorry, but that's not exactly what I was looking for. 
> I obviously
>    didn't explain properly:
> 
>    Within  my dataframe (df) I would like to find POT values 
> that are not
>    linked. In my definition two maximum values are linked if 
> walevel does not
>    fall below a certain value (the lower threshold value).
> 
>    walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 
> 5.86, 5.91, 5.91,
>    5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 
> 6.11, 6.14, 6.12,
>    6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 
> 5.72, 5.70, 5.65,
>    5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 
> 5.19, 5.19, 5.13,
>    5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 
> 5.22, 5.32, 5.29,
>    5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 
> 5.66, 5.68, 5.72,
>    5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 
> 5.62, 5.62, 5.61,
>    5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 
> 5.45, 5.47, 5.50,
>    5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
> 
>    day <- c(1:100)
> 
>    df <- data.frame(day,walevel)
> 
>    For example in my dataframe  a threshold value = 5.45 and 
> an lower threshold
>    value = 5.3 should lead to two maximum values because 
> between the 2^nd and
>    3^rd peak walevel does not fall below the lower threshold value.
> 
>    With  "clusters (...)" from "evd package", I could find 
> out POT values but
>    even though I set a lower threshold value for my example 
> dataframe I would
>    get three maximum values instead of two.
> 
>    library(evd)
> 
>    clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T)
> 
>    clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T)
> 
>    Changing r to 30 (for example) connects the 2^nd and 3^rd 
> maximum events and
>    gives out the right 'end' for the first extreme event but 
> not for the second
>    event. (Also I'd rather not use a time interval to prove 
> that the events are
>    linked)
> 
>    clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T)
> 
>    clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T)
> 
>    What went wrong???
> 
>    Tonja
>    -----Ursprüngliche Nachricht-----
>    How about?
>    hi.rle<-rle(walevel>5.79)
>    lo.rle<-rle(walevel<5.36)
>    plot(walevel)
>    abline(h=5.8,col=2,lty=3)
>    abline(h=5.35,col=3,lty=3)
>    hi.lo.rle<-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths)))
>    abline(v=hi.lo.rle)
>    You can use the $values from the rle to sort things out. Probably
>    want to ignore the end point at length(walevel).
>    --
>    Clint Bowman INTERNET: clint at ecy.wa.gov
>    Air Quality Modeler INTERNET: clint at math.utah.edu
>    Department of Ecology VOICE: (360) 407-6815
>    PO Box 47600 FAX: (360) 407-7534
>    Olympia, WA 98504-7600
>    On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote:
>    > dear Tonja,
>    > By plotting your data
>    >
>    > plot(df)
>    >
>    > it seems to me that you are looking for a piecewise 
> linear relationships.
>    If
>    >  this is the case, have a look to the package segmented. 
> You have to
>    specify
>    > or not the number and the starting values for the breakpoints
>    >
>    > library(segmented)
>    > olm<-lm(walevel~day)
>    >
>    > #specify number and starting values for the breakpoints..
>    > oseg<-segmented(olm, seg.Z=~day, psi=c(20,50,80))
>    > plot(oseg,add=TRUE,col=2)
>    > oseg$psi
>    >
>    > #or you can avoid to specify starting values for psi
>    > oseg<-segmented(olm, seg.Z=~day,
>    > psi=NA,control=seg.control(stop.if.error=FALSE,K=30))
>    >
>    > plot(oseg,add=TRUE,col=2)
>    > oseg$psi
>    >
>    >
>    > best,
>    > vito
>    >
>    >
>    > Tonja Krueger ha scritto:
>    >> Dear List
>    >>
>    >> I hope you can help me: I've got a dataframe (df) 
> within which I am
>    >> looking
>    >> for Peak Over Threshold values as well as the length of 
> the events. An
>    >> event
>    >> starts when walevel equals 5.8 and it should end when 
> walevel equals
>    >> the
>    >> lower threshold value (5.35).
>    >>
>    >> I tried "clusters (...)" from "evd package", and varied 
> r (see example)
>    >> but it
>    >> did not work for all events (again see example).
>    >>
>    >> walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 
> 5.86, 5.91,
>    >> 5.91,
>    >> 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 
> 6.11, 6.14, 6.12,
>    >> 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 
> 5.72, 5.70, 5.65,
>    >> 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 
> 5.19, 5.19, 5.13,
>    >> 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 
> 5.22, 5.32, 5.29,
>    >> 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 
> 5.66, 5.68, 5.72,
>    >> 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 
> 5.62, 5.62, 5.61,
>    >> 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 
> 5.45, 5.47, 5.50,
>    >> 5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
>    >>
>    >> day <- c(1:100)
>    >>
>    >> df <- data.frame(day,walevel)
>    >>
>    >> library(evd)
>    >> clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax 
> = T, plot = T)
>    >> clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, 
> cmax = T, plot = T)
>    >>
>    >> What have I done wrong?
>    >>
>    >> Tonja
>    >> ______________________________________________
>    >> R-help at r-project.org mailing list
>    >> [1]https://stat.ethz.ch/mailman/listinfo/r-help
>    >> PLEASE do read the posting guide
>    >> [2]http://www.R-project.org/posting-guide.html
>    >> and provide commented, minimal, self-contained, 
> reproducible code.
>    >>
>    >
>    >
> 
>    GRATIS: Movie-Flat mit über 300 Top-Videos. Für WEB.DE Nutzer
>    dauerhaft kostenlos! Jetzt freischalten unter 
> http://movieflat.web.de
> 
> References
> 
>    1. 
> file://localhost/../jump.htm?goto=https%3A%2F%2Fstat.ethz.ch%2
Fmailman%2Flistinfo%2Fr-help
>    2. 
> file://localhost/../jump.htm?goto=http%3A%2F%2Fwww.R-project.o
rg%2Fposting-guide.html
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list