[R] finding peaks in a simple dataset with R

Dylan Beaudette dylan.beaudette at gmail.com
Wed Nov 23 18:01:08 CET 2005


On Nov 23, 2005, at 8:10 AM, Martin Maechler wrote:

>>>>>> "Marc" == Marc Kirchner <marc.kirchner at iwr.uni-heidelberg.de>
>>>>>>     on Wed, 23 Nov 2005 14:33:28 +0000 writes:
>
>>>
>>> I wonder if we shouldn't polish that a bit and add to R's
>>> standard 'utils' package.
>>>
>
>     Marc> Hm, I figured out there are (at least) two versions out 
> there, one being
>     Marc> the "original" idea and a modification.
>
>     Marc> === Petr Pikal in 2001 (based on Brian Ripley's idea)==
>     Marc> peaks <- function(series, span=3) {
>     Marc> z <- embed(series, span)
>     Marc> result <- max.col(z) == 1 + span %/% 2
>     Marc> result
>     Marc> }
>
>     Marc> versus
>
>     Marc> === Petr Pikal in 2004 ==
>     Marc> peaks2<-function(series,span=3) {
>     Marc> z <- embed(series, span)
>     Marc> s <- span%/%2
>     Marc> v<- max.col(z) == 1 + s
>     Marc> result <- c(rep(FALSE,s),v)
>     Marc> result <- result[1:(length(result)-s)]
>     Marc> result
>     Marc> }
>
> Thank you, Marc,
>
>     Marc> Comparison shows
>>> peaks(c(1,4,1,1,6,1,5,1,1),3)
>     Marc> [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE
>     Marc> which is a logical vector for elements 2:N-1 and
>
>>> peaks2(c(1,4,1,1,6,1,5,1,1),3)
>     Marc> [1] FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
>     Marc> which is a logical vector for elements 1:N-2.
>
>     Marc> As I would expect to "lose" (span-1)/2 elements on each side
>     Marc> of the vector, to me the 2001 version feels more natural.
>
> I think for the function to be more useful it the result should
> have the original vector length and hence I'd propose to also
> pad with FALSE at the upper end.
>

Hi, been lurking for a while, and thought that I would chime in.

I have similar need: x,y data with many "peaks" - finding them with R 
would be a nice feature. However, as mentioned above the ability to 
find more than one peak would be especially helpful. In addition 
preserving the length of the input vector would help with linking peak 
locations to their real index.

In my case the data is generated by an X-ray diffraction machine: 
x-axis in Degrees, y-axis in relative intensity. The data looks like 
this:
http://casoilresource.lawr.ucdavis.edu/drupal/node/71


>     Marc> Also, both "suffer" from being non-deterministic in the
>     Marc> multiple-maxima-case (the two 4s here)
>
> yes, of course, because of max.col().
> Note that Venables & Ripley would consider this to be rather a feature.
>
>     Marc> This could (should?) be fixed by modifying the call to 
> max.col()
>     Marc> result <- max.col(z, "first") == 1 + span %/% 2;
>
> Actually I think it should become an option, but I'd use "first"
> as default.
>

I would second that.

>     Marc> Just my two cents,
>
> Thank you.
>
> Here is my current proposal which also demonstrates why it's
> useful to pad with FALSE :
>
> peaks <-function(series, span = 3, ties.method = "first") {
>     if((span <- as.integer(span)) %% 2 != 1) stop("'span' must be odd")
>     z <- embed(series, span)
>     s <- span %/% 2
>     v <- max.col(z, ties.method = ties.method) == s + 1:1
>     pad <- rep(FALSE, s)
>     c(pad, v, pad)
> }
>
> y <- c(1,4,1,1,6,1,5,1,1) ; (ii <- which(peaks(y))); y[ii]
> ##- [1] 2 5 7
> ##- [1] 4 6 5
>
> set.seed(7)
> y <- rpois(100, lambda = 7)
> py <- peaks(y)
> plot(y, type="o", cex = 1/4, main = "y and peaks(y,3)")
> points(seq(y)[py], y[py], col = 2, cex = 1.5)
>
> p7 <- peaks(y,7)
> points(seq(y)[p7], y[p7], col = 3, cex = 2)
> mtext("peaks(y,7)", col=3)

Thanks for working on this, as I would imagine there are other lurkers 
out there who are waiting for a solution to this problem.

--
Dylan Beaudette
Soils and Biogeochemistry Graduate Group
University of California at Davis
530.754.7341




More information about the R-help mailing list