[R] Strange behaviour of `dlm` package

Wed Jan 8 11:32:15 CET 2014

Dear R-help!

I have encountered strange behaviour (that is, far-off filtering, smoothing
and forecast distributions under certain conditions) in the `dlm` package by
Giovanni Petris.

Here is an example:

I use the annual hotel bookings time series data, which I model using a
second order polinomial DLM.

First I perform the analysis with the data in logarithmic form and
everything seems to be ok.

I repeat the analysis with the original, non logarithmic data and the
filtering, smoothing and forecast distributions are *very* far off the
actual data. 

I again repeat the analysis, this time I also specify the expected value and
variance of the pre-sample state vector (m0 and C0). The filtering and
forecast distribution seem to be influenced by these values only and ignore
the actual data. 

I would like to know what I would have to do in order to get plausible
results from DLM with data in non-logarithmic form. Is there something that
I am missing?

Thanks in advance!

Martin Zuba

References:
dlm introductory paper
<http://cran.r-project.org/web/packages/dlm/vignettes/dlm.pdf>   by Giovanni
Petris (2009)
dlm package  help file <http://benz.nchu.edu.tw/~finmyc/dlm.pdf>  

This R code reproduces my analysis:

# DLM anomaly

rm(list=ls())

hotann <- ts(start=1973, data=c(13733531, 15445493, 16024342, 16715224,
16463957, 17896118, 18263114, 
                                18727641, 19756293, 20352440, 22489054,
23442393, 24170736, 25308044, 
                                25021560, 25724331, 25990336, 26602038,
27292250, 28891954, 29659700, 
                                31533579, 32513666, 33628559, 34494451))

plot(hotann, ylab="Annual hotel bookings")

# Analysis with log data

tsdata <- log(hotann)

buildfun <- function (x) {
  dlmModPoly(order = 2, dV = exp(x[1]), dW = c(0,exp(x[2])))
}

fit <- dlmMLE(y=tsdata, parm=c(0,0), build=buildfun)
# Warning: a numerically singular 'V' has been slightly perturbed to make it
nonsingular
fit$conv

dlmTsdata <- buildfun(fit$par)
tsdataFilter <- dlmFilter(tsdata, mod=dlmTsdata)
tsdataSmooth <- dlmSmooth(tsdata, mod=dlmTsdata)

plot(tsdata, lwd=2)
for (i in 1:10)
  lines(lty=6, col="blue", dropFirst(dlmBSample(tsdataFilter))[,1])
# looks ok!

tsdataForecast <- dlmForecast(tsdataFilter, nAhead=20)
sqrtR <- sapply(tsdataForecast$R, function(x) sqrt(x[1,1]))
pl <- tsdataForecast$a[,1] + qnorm(0.05, sd= sqrtR)
pu <- tsdataForecast$a[,1] + qnorm(0.95, sd= sqrtR)

x <- ts.union(tsdata,tsdataSmooth$s[,1],tsdataForecast$a[,1],pl,pu)
plot(x, plot.type="single", type="l",
col=c("black","blue","red","orange","orange"), ylab="TS data")
# looks ok!

# Repeat with non-log data.

tsdata <- hotann

buildfun <- function (x) {
  dlmModPoly(order = 2, dV = exp(x[1]), dW = c(0,exp(x[2])))
}

fit <- dlmMLE(y=tsdata, parm=c(0,0), build=buildfun)
fit$conv

dlmTsdata <- buildfun(fit$par)
tsdataFilter <- dlmFilter(tsdata, mod=dlmTsdata)
tsdataSmooth <- dlmSmooth(tsdata, mod=dlmTsdata)

plot(tsdata, lwd=2, ylim=c(0, 6e+07))
for (i in 1:10)
  lines(lty=6, col="blue", dropFirst(dlmBSample(tsdataFilter))[,1])
# where is the filtering distribution???

tsdataForecast <- dlmForecast(tsdataFilter, nAhead=20)
sqrtR <- sapply(tsdataForecast$R, function(x) sqrt(x[1,1]))
pl <- tsdataForecast$a[,1] + qnorm(0.05, sd= sqrtR)
pu <- tsdataForecast$a[,1] + qnorm(0.95, sd= sqrtR)

x <- ts.union(tsdata,tsdataSmooth$s[,1],tsdataForecast$a[,1],pl,pu)
plot(x, plot.type="single", type="l",
col=c("black","blue","red","orange","orange"), ylab="TS data")
# where is the forecast???

# strong sensitivity towards expected value (m0) and variance (C0) of
pre-sample state vectors
# m0 is rep(0,order) in the default specification, this might be responsible
for the results above (?)
# repeat analysis, but specify values for m0 and C0

tsdata <- hotann

buildfun <- function (x) {
  dlmModPoly(order = 2, dV = exp(x[1]), dW = c(0,exp(x[2])),
m0=c(x[3],x[4]), C0 = exp(x[5]) * diag(nrow = 2))
}

fit <- dlmMLE(y=tsdata, parm=c(0,0,2e+07,20000,20), build=buildfun)
fit$conv

dlmTsdata <- buildfun(fit$par)
tsdataFilter <- dlmFilter(tsdata, mod=dlmTsdata)
tsdataSmooth <- dlmSmooth(tsdata, mod=dlmTsdata)

plot(tsdata, lwd=2, ylim=c(0, 6e+07))
for (i in 1:10)
  lines(lty=6, col="blue", dropFirst(dlmBSample(tsdataFilter))[,1])
# the distribution only follows m0, C0 and seems ignorant of the data

tsdataForecast <- dlmForecast(tsdataFilter, nAhead=20)
sqrtR <- sapply(tsdataForecast$R, function(x) sqrt(x[1,1]))
pl <- tsdataForecast$a[,1] + qnorm(0.05, sd= sqrtR)
pu <- tsdataForecast$a[,1] + qnorm(0.95, sd= sqrtR)

x <- ts.union(tsdata,tsdataSmooth$s[,1],tsdataForecast$a[,1],pl,pu)
plot(x, plot.type="single", type="l",
col=c("black","blue","red","orange","orange"), ylab="TS data")
# the forecast only follows m0, C0 and seems ignorant of the data

--
View this message in context: http://r.789695.n4.nabble.com/Strange-behaviour-of-dlm-package-tp4683255.html
Sent from the R help mailing list archive at Nabble.com.