[R] Population abundance, change point

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Nov 24 23:49:46 CET 2010


On Wed, 2010-11-24 at 13:00 -0500, Jonathan P Daily wrote:
> I understand that smoothing splines produce continuous models, however the 
> end product of SiZer is not a single model that is then used in any 
> predictive manner. Rather, the end product is a map of potential 
> changepoint locations along a gradient. Are you suggesting that SiZer 
> would not find a changepoint at the Aswan Dam?

No, not at all. What I am arguing is that SiZer will find a series of
models of diminishing bandwidth that detects change points that are
smoothly varying between states either side of the change point. In fact
SiZer is fitting a series of smoothly varying models and showing you
where the derivatives of those smoothly varying functions are
significantly different from 0.

require(SiZer)
mod <- SiZer(time(Nile), Nile, x.grid = 40) ## slowish

All those models show us that something changed around 1900, but the
nature of that change is one that is smoothly varying.

My point is that SiZer posits a particular form for the nature of the
change - smoothly varying. If all you use is SiZer then these are the
sorts of changes you will find. Did the flow at Aswan change smoothly
over a period of many years? If all you used was SiZer, then you'd
conclude it did. It might look something like:

mod2 <- smooth.spline(time(Nile), Nile, df = 10)
pred <- predict(mod2, x = seq(min(time(Nile)), max(time(Nile)),
                length = 200))
plot(Nile)
lines(pred, col = "red", lwd = 2)

But if we used a different change point detection routine that focussed
on a discontinuity, then that is the sort of change point we would
detect. Here is a tree version, for example:

require(rpart)
mod3 <- rpart(Nile ~ time(Nile))
## 1898 change point
plot(Nile)
lines(pred, col = "red", lwd = 2)
before <- mean(Nile[time(Nile) < 1898])
after <- mean(Nile[time(Nile) >= 1898])
lines(1871:1897, y = rep(before, 27), col = "blue", lwd = 2)
lines(1898:1970, y = rep(after, 73), col = "blue", lwd = 2)

So, in short, my point was you need to understand what form of change
point a given technique is looking to identify otherwise you could be
misled as to what is going on.

G

>  If so, I would argue 
> against that conclusion. If your point is that pulling one underlying 
> spline model used to create a SiZer map out and trying to draw conclusions 
> from that model is ineffective, you are absolutely right. I tend to think 
> of SiZer outputs more akin to tree outputs than piecewise regression 
> outputs.
> --------------------------------------
> Jonathan P. Daily
> Technician - USGS Leetown Science Center
> 11649 Leetown Road
> Kearneysville WV, 25430
> (304) 724-4480
> "Is the room still a room when its empty? Does the room,
>  the thing itself have purpose? Or do we, what's the word... imbue it."
>      - Jubal Early, Firefly
> 
> Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote on 11/24/2010 12:09:15 PM:
> 
> > [image removed] 
> > 
> > Re: [R] Population abundance, change point
> > 
> > Gavin Simpson 
> > 
> > to:
> > 
> > Jonathan P Daily
> > 
> > 11/24/2010 12:09 PM
> > 
> > Cc:
> > 
> > r-help, carusonm
> > 
> > Please respond to gavin.simpson
> > 
> > On Wed, 2010-11-24 at 10:59 -0500, Jonathan P Daily wrote:
> > > I agree that SiZer is not the ultimate answer to all changepoint 
> analysis, 
> > > but that is why there are so many changepoint detection methods used. 
> I 
> > > will clarify, though, that my understanding of SiZer (which may be 
> wrong) 
> > > was that the smoothing splines are just a vessel for finding the 
> > > changepoints, and made no assumptions about the continuity of the 
> > > changepoint itself.
> > 
> > But by the nature of using splines, they *you) are positing a model for
> > what the changepoint looks like. If you model the classic Nile data with
> > a spline model (say using gam()) then it will suggest that there was a
> > smooth transition from high flow before the dam at Aswan was built to
> > lower flow afterwards. Other techniques that posit a discontinuity as a
> > changepoint would fit two, effectively flat [slope 0], linear lines
> > either side of the point when the dam was built.
> > 
> > If you didn't know a dam was built and all you fitted was a spline to
> > the data, that would likely influence how you interpreted the change.
> > 
> > > One thing that would certainly help, especially with the confidence 
> > > intervals about 0, is some bandwidth selection standard, though 
> choosing 
> > > that standard would be a difficult process to say the least.
> > 
> > That won't help with the lack of independence in the residuals, which
> > will result in deflated CI on the derivatives, hence more wiggles seem
> > significant... Oh the joy.
> > 
> > Also, there has to be a point where with siZer you are just looking at
> > pattern in noise at the small bandwidth end of things...
> > 
> > I'll get off my soap box about now! ;-)
> > 
> > Cheers,
> > 
> > G
> > 
> > > --------------------------------------
> > > Jonathan P. Daily
> > > Technician - USGS Leetown Science Center
> > > 11649 Leetown Road
> > > Kearneysville WV, 25430
> > > (304) 724-4480
> > > "Is the room still a room when its empty? Does the room,
> > >  the thing itself have purpose? Or do we, what's the word... imbue 
> it."
> > >      - Jubal Early, Firefly
> > > 
> > > Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote on 11/24/2010 09:15:55 
> AM:
> > > 
> > > > [image removed] 
> > > > 
> > > > Re: [R] Population abundance, change point
> > > > 
> > > > Gavin Simpson 
> > > > 
> > > > to:
> > > > 
> > > > Jonathan P Daily
> > > > 
> > > > 11/24/2010 09:16 AM
> > > > 
> > > > Cc:
> > > > 
> > > > Mike Marchywka, r-help, r-help-bounces, carusonm
> > > > 
> > > > Please respond to gavin.simpson
> > > > 
> > > > On Wed, 2010-11-17 at 09:17 -0500, Jonathan P Daily wrote:
> > > > > Indeed I have looked into various non-standard changepoint 
> analysis 
> > > > > methods. I figured the OP was more interested in traditional 
> methods 
> > > since 
> > > > > you have to spend less time justifying your methodology. Wavelets 
> are 
> > > one 
> > > > > potential nontraditional method, as is Significant Zero Crossings 
> (R 
> > > > > package SiZer), which fits arbitrary-degree smoothing splines over 
> a 
> > > range 
> > > > > of bandwidth parameters and looks for changes.
> > > > 
> > > > ...By looking to see if the derivative of the fitted curve is 
> different
> > > > from 0 (given a suitable confidence interval on the derivative. My
> > > > problem with all of this is that these data are time series and 
> SiZeR
> > > > doesn't take this into account (AFAICS) when computing the 
> confidence
> > > > intervals - they are certainly too narrow for examples I have run.
> > > > 
> > > > Also, if these things are using splines, aren't we already assuming 
> that
> > > > the underlying function is smooth and not a discontinuity? So which
> > > > technique the OP chooses will depend on how they think about the 
> type of
> > > > change taking place at the "changepoint" - a point I think you made
> > > > earlier Jonathan.
> > > > 
> > > > Don't mean to be too negative, this has been a very useful 
> discussion
> > > > that I am coming to late after a spot of time in the field.
> > > > 
> > > > All the best,
> > > > 
> > > > G
> > > > 
> > > > >  With large communities of 
> > > > > abundance counts, another approach that is gaining popularity is 
> the 
> > > > > community-level indicator taxa analysis (TITAN), though that is 
> not 
> > > useful 
> > > > > to the OP. 
> > > > > --------------------------------------
> > > > > Jonathan P. Daily
> > > > > Technician - USGS Leetown Science Center
> > > > > 11649 Leetown Road
> > > > > Kearneysville WV, 25430
> > > > > (304) 724-4480
> > > > > "Is the room still a room when its empty? Does the room,
> > > > >  the thing itself have purpose? Or do we, what's the word... imbue 
> 
> > > it."
> > > > >      - Jubal Early, Firefly
> > > > > 
> > > > > Mike Marchywka <marchywka at hotmail.com> wrote on 11/17/2010 
> 09:11:11 
> > > AM:
> > > > > 
> > > > > > [image removed] 
> > > > > > 
> > > > > > RE: [R] Population abundance, change point
> > > > > > 
> > > > > > Mike Marchywka 
> > > > > > 
> > > > > > to:
> > > > > > 
> > > > > > jdaily, carusonm
> > > > > > 
> > > > > > 11/17/2010 09:11 AM
> > > > > > 
> > > > > > Cc:
> > > > > > 
> > > > > > r-help, r-help-bounces
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > To: carusonm at gmail.com
> > > > > > > From: jdaily at usgs.gov
> > > > > > > Date: Wed, 17 Nov 2010 08:45:01 -0500
> > > > > > > CC: r-help at r-project.org; r-help-bounces at r-project.org
> > > > > > > Subject: Re: [R] Population abundance, change point
> > > > > > >
> > > > > > > There are really no set ways to determine a changepoint, since 
> a
> > > > > > > changepoint depends completely on what you decide. Recursive 
> > > > > partitioning
> > > > > > > will fit a best changepoint, but it will pretty much always 
> fit 
> > > one. 
> > > > > This
> > > > > > If you are open to newer ideas,
> > > > > > have you looked at wavelets at all? these come up on googel 
> along 
> > > with 
> > > > > R.
> > > > > > Also with aonly a few points, even 20-30, you coldconsider 
> > > exhasiutvely
> > > > > > fitting slopes to all 2^n subsets and plowing throgh the 
> histograms
> > > > > > looking for anything that may be publishable or illuminating 
> about 
> > > your 
> > > > > data.
> > > > > > Fitting to your own model or null hypotheses would make 
> interesting
> > > > > > contrasts of course, " populations remained the same after 
> atrazine 
> > > > > spill
> > > > > > or asteroid hit" etc.
> > > > > > 
> > > > > > 
> > > > > > > function can be found in the package rpart:
> > > > > > >
> > > > > > > > fit <- rpart(count ~ year, control = list(maxdepth = 1))
> > > > > > > > summary(fit)
> > > > > > >
> > > > > > > However this measure offers no level of confidence. This is 
> where 
> > > > > packages
> > > > > > > like strucchange and party come into use, as they provide 
> measures 
> > > of
> > > > > > > confidence. Alternatively, you could look into 
> regression-based 
> > > > > methods
> > > > > > > where the changepoint is some parameter. Piecewise regression, 
> for
> > > > > > > instance, is as simple as fitting a spline of degree 1 and 
> > > changepoint 
> > > > > X:
> > > > > > >
> > > > > > > > library(splines)
> > > > > > > > fit <- lm(count ~ bs(year, knots = X, degree = 1))
> > > > > > > > plot(year, count)
> > > > > > > > lines(year, fitted(fit))
> > > > > > >
> > > > > > > Then you can fit a regression at each year and compare. 
> > > Alternatively,
> > > > > > > since count data is often noisy, you could easily substitute 
> > > quantile
> > > > > > > regression for linear regression to much of the same effect 
> > > (assuming
> > > > > > > whatever tau you decide, I used 0.8 but this is arbitrary):
> > > > > > >
> > > > > > > > library(splines)
> > > > > > > > library(quantreg)
> > > > > > > > fit <- rq(count ~ bs(year, knots = X, degree = 1), tau = 
> 0.8)
> > > > > > > > plot(year, count)
> > > > > > > > lines(year, fitted(fit))
> > > > > > > --------------------------------------
> > > > > > > Jonathan P. Daily
> > > > > > > Technician - USGS Leetown Science Center
> > > > > > > 11649 Leetown Road
> > > > > > > Kearneysville WV, 25430
> > > > > > > (304) 724-4480
> > > > > > > "Is the room still a room when its empty? Does the room,
> > > > > > > the thing itself have purpose? Or do we, what's the word... 
> imbue 
> > > it."
> > > > > > > - Jubal Early, Firefly
> > > > > > >
> > > > > > > r-help-bounces at r-project.org wrote on 11/16/2010 05:30:49 PM:
> > > > > > >
> > > > > > > > [image removed]
> > > > > > > >
> > > > > > > > [R] Population abundance, change point
> > > > > > > >
> > > > > > > > Nicholas M. Caruso
> > > > > > > >
> > > > > > > > to:
> > > > > > > >
> > > > > > > > r-help
> > > > > > > >
> > > > > > > > 11/16/2010 05:32 PM
> > > > > > > >
> > > > > > > > Sent by:
> > > > > > > >
> > > > > > > > r-help-bounces at r-project.org
> > > > > > > >
> > > > > > > > I am trying to understand my population abundance data and 
> am 
> > > > > looking
> > > > > > > into
> > > > > > > > analyses of change point to try and determine, at 
> approximately 
> > > what
> > > > > > > point
> > > > > > > > do populations begin to change (either decline or 
> increasing).
> > > > > > > >
> > > > > > > > Can anyone offer suggestions on ways to go about this?
> > > > > > > >
> > > > > > > > I have looked into bcp and strucchange packages but am not 
> > > > > completely
> > > > > > > > convinced that these are appropriate for my data.
> > > > > > > >
> > > > > > > > Here is an example of what type of data I have
> > > > > > > > Year of survey (continuous variable) 1960 - 2009 (there are 
> gaps 
> > > in 
> > > > > the
> > > > > > > > surveys (e.g., there were no surveys from 2002-2004)
> > > > > > > > Relative abundance of salamanders during the survey periods
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks for your help, Nick
> > > > > > > >
> > > > > > > > --
> > > > > > > > Nicholas M Caruso
> > > > > > > > Graduate Student
> > > > > > > > CLFS-Biology
> > > > > > > > 4219 Biology-Psychology Building
> > > > > > > > University of Maryland, College Park, MD 20742-5815
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 
> > > ------------------------------------------------------------------
> > > > > > > > I learned something of myself in the woods today,
> > > > > > > > and walked out pleased for having made the acquaintance.
> > > > > > > >
> > > > > > > > [[alternative HTML version deleted]]
> > > > > > > >
> > > > > > > > ______________________________________________
> > > > > > > > R-help at r-project.org mailing list
> > > > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > > > > PLEASE do read the posting guide
> > > > > > > http://www.R-project.org/posting-guide.html
> > > > > > > > and provide commented, minimal, self-contained, reproducible 
> 
> > > code.
> > > > > > >
> > > > > > > ______________________________________________
> > > > > > > R-help at r-project.org mailing list
> > > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > > > PLEASE do read the posting guide 
> > > > > http://www.R-project.org/posting-guide.html
> > > > > > > and provide commented, minimal, self-contained, reproducible 
> code.
> > > > > >
> > > > > 
> > > > > ______________________________________________
> > > > > R-help at r-project.org mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.
> > > > 
> > > > -- 
> > > > 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > >  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> > > >  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> > > >  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> > > >  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> > > >  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> > > > 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > > >
> > > 
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > 
> > -- 
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> >  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> >  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> >  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> >  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > 
> 

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list