[R] Peak Over Threshold values

Tonja Krueger tonja.krueger at web.de
Mon May 31 11:03:28 CEST 2010


Thanks a lot for your help. That’s the time period I was looking for. 

I’ve got one more question: for further analyses I need the respective maximum values
within these time periods (between the green and red lines). Preferably in combination
with the date the maximum event happened. 

Thank you
in advance, 

Tonja



-----Ursprüngliche Nachricht-----
Von: William Dunlap <wdunlap at tibco.com>
Gesendet: 27.05.2010 22:13:21
An: "Hutchinson,David [PYR]" <David.Hutchinson at ec.gc.ca>,Tonja Krueger <tonja.krueger at web.de>
Betreff: RE: [R] Peak Over Threshold values

>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
>> Sent: Thursday, May 27, 2010 12:24 PM
>> To: Hutchinson,David [PYR]; Tonja Krueger
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Peak Over Threshold values
>> 
>> Here is another version, loopless, but still
>> a bit clumsy (can the call to which be removed?):
>
>Note that avoiding loops is mostly for
>the aesthetic effect.  On my aging laptop
>the following loopy and vector-growing version
>takes 3 seconds on a million-long vector
>while the non-loopy one takes 0.22 seconds
>(timings done with plot=FALSE).  That is
>a nice ratio but not much of a difference.
>
>f0 <- function (x = walevel,
>    startThreshhold = 5.45,
>    stopThreshhold = 5.3, 
>    plot = TRUE) 
>{
>    stopifnot(startThreshhold > stopThreshhold)
>    inRun <- FALSE
>    start <- integer()
>    stop <- integer()
>    for (i in seq_along(x)) {
>        if (inRun) {
>            if (x[i] < stopThreshhold) {
>                stop[length(stop) + 1] <- i - 1
>                inRun <- FALSE
>            }
>        }
>        else {
>            if (x[i] > startThreshhold) {
>                start[length(start) + 1] <- i
>                inRun <- TRUE
>            }
>        }
>    }
>    if (inRun) 
>        stop[length(stop) + 1] <- length(x)
>    if (length(stop) > length(start)) 
>        stop <- stop[-1]
>    if (plot) {
>        plot(x, cex = 0.5)
>        abline(h = c(startThreshhold, stopThreshhold))
>        abline(v = start, col = "green")
>        abline(v = stop, col = "red")
>    }
>    data.frame(start = start, stop = stop)
>}
>
>> 
>> f <- function (x = walevel,
>>                startThreshhold = 5.45,
>>                stopThreshhold = 5.3, 
>>                plot = TRUE) 
>> {
>>     stopifnot(startThreshhold > stopThreshhold)
>>     isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)])
>>     isLastInRun <- function(x) c(x[-1] != x[-length(x)], TRUE)
>>     isOverStart <- x >= startThreshhold
>>     isOverStop <- x >= stopThreshhold
>>     possibleStartPt <- which(isFirstInRun(isOverStart) & isOverStart)
>>     possibleStopPt <- which(isLastInRun(isOverStop) & isOverStop)
>>     pts <- c(possibleStartPt, possibleStopPt)
>>     names(pts) <- rep(c("start", "stop"),
>>         c(length(possibleStartPt), length(possibleStopPt)))
>>     pts <- pts[order(pts)]
>>     tmp <- isFirstInRun(names(pts))
>>     start <- pts[tmp & names(pts) == "start"]
>>     stop <- pts[tmp & names(pts) == "stop"]
>>     if (length(stop) > length(start)) { 
>>         # i.e., when first downcrossing happens before
>>         # first upcrossing
>>         stop <- stop[-1]
>>     }
>>     if (plot) {
>>         plot(x, cex = 0.5)
>>         abline(h = c(startThreshhold, stopThreshhold))
>>         abline(v = start, col = "green")
>>         abline(v = stop, col = "red")
>>     }
>>     data.frame(start = start, stop = stop)
>> }
>> 
>> # define the data in the original message and call f().
>> 
>> The isFirstInRun and isLastInRun functions do the
>> first part of the calculation that rle does and
>> avoid the cumsum(diff()) roundtrip that
>> cumsum(rle()$lengths) involves and their names
>> make it clearer what I'm trying to do.
>> 
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com  
>> 
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org 
>> > [mailto:r-help-bounces at r-project.org] On Behalf Of 
>> > Hutchinson,David [PYR]
>> > Sent: Thursday, May 27, 2010 10:41 AM
>> > To: Tonja Krueger
>> > Cc: r-help at r-project.org
>> > Subject: Re: [R] Peak Over Threshold values
>> > 
>> > Perhaps not elegant, but does the job
>> > 
>> > threshold <- 5.45
>> > meetThreshold <- walevel >= threshold
>> > 
>> > d <- rle(meetThreshold)
>> > startPos <- c(1, 1 + cumsum(d$lengths[-length(d$lengths)]))
>> > 
>> > abv <- which(d$values == TRUE)
>> > p.o.t <- data.frame()
>> > 
>> > for (i in seq( along = abv ) ) {
>> >   a <- startPos[abv[i]]
>> >   b <- a + (d$lengths[abv[i]] - 1)
>> >   e <- which.max(walevel[a:b])
>> >   p.o.t <- rbind( p.o.t, data.frame(
>> >                pos = a + e - 1,
>> >                val = walevel[a:b][e]
>> >            ) )
>> > }
>> > 
>> > plot (
>> >   day,
>> >   walevel, type = 'l'
>> > )
>> > 
>> > points(
>> >   p.o.t$pos,
>> >   p.o.t$val,
>> >   col = 'red'
>> > )
>> > 
>> > abline(h = threshold, lty = 2, col = 'red')
>> > 
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org 
>> > [mailto:r-help-bounces at r-project.org] On Behalf Of Tonja Krueger
>> > Sent: Thursday, May 27, 2010 1:47 AM
>> > To: Vito Muggeo (UniPa); Clint Bowman
>> > Cc: r-help at r-project.org
>> > Subject: [R] Peak Over Threshold values
>> > 
>> > 
>> >    I'm  sorry, but that's not exactly what I was looking for. 
>> > I obviously
>> >    didn't explain properly:
>> > 
>> >    Within  my dataframe (df) I would like to find POT values 
>> > that are not
>> >    linked. In my definition two maximum values are linked if 
>> > walevel does not
>> >    fall below a certain value (the lower threshold value).
>> > 
>> >    walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 
>> > 5.86, 5.91, 5.91,
>> >    5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 
>> > 6.11, 6.14, 6.12,
>> >    6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 
>> > 5.72, 5.70, 5.65,
>> >    5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 
>> > 5.19, 5.19, 5.13,
>> >    5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 
>> > 5.22, 5.32, 5.29,
>> >    5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 
>> > 5.66, 5.68, 5.72,
>> >    5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 
>> > 5.62, 5.62, 5.61,
>> >    5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 
>> > 5.45, 5.47, 5.50,
>> >    5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
>> > 
>> >    day <- c(1:100)
>> > 
>> >    df <- data.frame(day,walevel)
>> > 
>> >    For example in my dataframe  a threshold value = 5.45 and 
>> > an lower threshold
>> >    value = 5.3 should lead to two maximum values because 
>> > between the 2^nd and
>> >    3^rd peak walevel does not fall below the lower threshold value.
>> > 
>> >    With  "clusters (...)" from "evd package", I could find 
>> > out POT values but
>> >    even though I set a lower threshold value for my example 
>> > dataframe I would
>> >    get three maximum values instead of two.
>> > 
>> >    library(evd)
>> > 
>> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, cmax=T)
>> > 
>> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 1, plot=T)
>> > 
>> >    Changing r to 30 (for example) connects the 2^nd and 3^rd 
>> > maximum events and
>> >    gives out the right 'end' for the first extreme event but 
>> > not for the second
>> >    event. (Also I'd rather not use a time interval to prove 
>> > that the events are
>> >    linked)
>> > 
>> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, cmax=T)
>> > 
>> >    clusters(df$walevel,u =5.45, ulow= 5.3, r = 30, plot=T)
>> > 
>> >    What went wrong???
>> > 
>> >    Tonja
>> >    -----Ursprüngliche Nachricht-----
>> >    How about?
>> >    hi.rle<-rle(walevel>5.79)
>> >    lo.rle<-rle(walevel<5.36)
>> >    plot(walevel)
>> >    abline(h=5.8,col=2,lty=3)
>> >    abline(h=5.35,col=3,lty=3)
>> >    hi.lo.rle<-sort(c(cumsum(hi.rle$lengths),cumsum(lo.rle$lengths)))
>> >    abline(v=hi.lo.rle)
>> >    You can use the $values from the rle to sort things out. Probably
>> >    want to ignore the end point at length(walevel).
>> >    --
>> >    Clint Bowman INTERNET: clint at ecy.wa.gov
>> >    Air Quality Modeler INTERNET: clint at math.utah.edu
>> >    Department of Ecology VOICE: (360) 407-6815
>> >    PO Box 47600 FAX: (360) 407-7534
>> >    Olympia, WA 98504-7600
>> >    On Wed, 26 May 2010, Vito Muggeo (UniPa) wrote:
>> >    > dear Tonja,
>> >    > By plotting your data
>> >    >
>> >    > plot(df)
>> >    >
>> >    > it seems to me that you are looking for a piecewise 
>> > linear relationships.
>> >    If
>> >    >  this is the case, have a look to the package segmented. 
>> > You have to
>> >    specify
>> >    > or not the number and the starting values for the breakpoints
>> >    >
>> >    > library(segmented)
>> >    > olm<-lm(walevel~day)
>> >    >
>> >    > #specify number and starting values for the breakpoints..
>> >    > oseg<-segmented(olm, seg.Z=~day, psi=c(20,50,80))
>> >    > plot(oseg,add=TRUE,col=2)
>> >    > oseg$psi
>> >    >
>> >    > #or you can avoid to specify starting values for psi
>> >    > oseg<-segmented(olm, seg.Z=~day,
>> >    > psi=NA,control=seg.control(stop.if.error=FALSE,K=30))
>> >    >
>> >    > plot(oseg,add=TRUE,col=2)
>> >    > oseg$psi
>> >    >
>> >    >
>> >    > best,
>> >    > vito
>> >    >
>> >    >
>> >    > Tonja Krueger ha scritto:
>> >    >> Dear List
>> >    >>
>> >    >> I hope you can help me: I've got a dataframe (df) 
>> > within which I am
>> >    >> looking
>> >    >> for Peak Over Threshold values as well as the length of 
>> > the events. An
>> >    >> event
>> >    >> starts when walevel equals 5.8 and it should end when 
>> > walevel equals
>> >    >> the
>> >    >> lower threshold value (5.35).
>> >    >>
>> >    >> I tried "clusters (...)" from "evd package", and varied 
>> > r (see example)
>> >    >> but it
>> >    >> did not work for all events (again see example).
>> >    >>
>> >    >> walevel <- c(5.75, 5.75, 5.75, 5.76, 5.80, 5.82, 5.85, 
>> > 5.86, 5.91,
>> >    >> 5.91,
>> >    >> 5.88, 5.92, 5.99, 6.02, 6.00, 6.08, 6.11, 6.10, 6.10, 
>> > 6.11, 6.14, 6.12,
>> >    >> 6.15, 6.17, 6.15, 6.08, 6.01, 5.95, 5.89, 5.83, 5.77, 
>> > 5.72, 5.70, 5.65,
>> >    >> 5.59, 5.51, 5.43, 5.33, 5.30, 5.25, 5.22, 5.21, 5.19, 
>> > 5.19, 5.19, 5.13,
>> >    >> 5.15, 5.07, 5.04, 5.02, 4.99, 5.05, 5.07, 5.12, 5.17, 
>> > 5.22, 5.32, 5.29,
>> >    >> 5.33, 5.33, 5.36, 5.37, 5.42, 5.43, 5.48, 5.55, 5.57, 
>> > 5.66, 5.68, 5.72,
>> >    >> 5.77, 5.79, 5.81, 5.80, 5.80, 5.77, 5.72, 5.70, 5.66, 
>> > 5.62, 5.62, 5.61,
>> >    >> 5.59, 5.57, 5.51, 5.47, 5.41, 5.39, 5.40, 5.40, 5.42, 
>> > 5.45, 5.47, 5.50,
>> >    >> 5.50, 5.49, 5.43, 5.39, 5.33, 5.26)
>> >    >>
>> >    >> day <- c(1:100)
>> >    >>
>> >    >> df <- data.frame(day,walevel)
>> >    >>
>> >    >> library(evd)
>> >    >> clusters(df$walevel, u = 5.80, r = 1, ulow = 5.35, cmax 
>> > = T, plot = T)
>> >    >> clusters(df$walevel, u = 5.80, r = 50, ulow = 5.35, 
>> > cmax = T, plot = T)
>> >    >>
>> >    >> What have I done wrong?
>> >    >>
>> >    >> Tonja
>> >    >> ______________________________________________
>> >    >> R-help at r-project.org mailing list
>> >    >> [1]https://stat.ethz.ch/mailman/listinfo/r-help
>> >    >> PLEASE do read the posting guide
>> >    >> [2]http://www.R-project.org/posting-guide.html
>> >    >> and provide commented, minimal, self-contained, 
>> > reproducible code.
>> >    >>
>> >    >
>> >    >
>> > 
>> >    GRATIS: Movie-Flat mit über 300 Top-Videos. Für WEB.DE Nutzer
>> >    dauerhaft kostenlos! Jetzt freischalten unter 
>> > http://movieflat.web.de
>> > 
>> > References
>> > 
>> >    1. 
>> > file://localhost/../jump.htm?goto=https%3A%2F%2Fstat.ethz.ch%2
>> Fmailman%2Flistinfo%2Fr-help
>> >    2. 
>> > file://localhost/../jump.htm?goto=http%3A%2F%2Fwww.R-project.o
>> rg%2Fposting-guide.html
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide 
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> > 
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide 
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> > 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
___________________________________________________________
GRATIS für alle WEB.DE Nutzer: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://movieflat.web.de



More information about the R-help mailing list