[R] Out-of-sample prediction with VAR

Sun Feb 7 23:37:16 CET 2010

Good day,

I'm using a VAR model to forecast sales with some extra variables (google
trends data). I have divided my dataset into a trainingset (weekly sales +
vars in 2006 and 2007) and a holdout set (2008).
It is unclear to me how I should predict the out-of-sample data, because
using the predict() function in the vars package seems to estimate my
google trends vars as well. However, I want to forecast the sales figures,
with knowledge of the actual google trends data.

My questions:
1. How should I do this? I currently extract the linear model generated by
the VAR(3) function to predict the holdout set, but that seems
inappropriate?
2. In case that I am doing it right, how is it possible that a
automatically fitted model with more variables actually performs less good
(in terms of MAPE)? Shouldn't it at least predict just as well as the
simple AR(3) by finding that the extra variables have no added value?

My code:

	ts_Y <- ts(log_residuals[1:104]); # detrended sales data
	ts_XGG <- ts(salesmodeldata$gtrends_global[1:104]);
	ts_XGL <- ts(salesmodeldata$gtrends_local[1:104]);
	training_matrix <- data.frame(ts_Y, ts_XGG, ts_XGL);

	### Try VAR(3)
		var_model <- VAR (y=training_matrix, p=3, type="both", season=NULL,
exogen=NULL,  lag.max=NULL);

	## Out of sample forecasting
		var.lm = lm(var_model$varresult$ts_Y); # the generated LM

		ts_Y <- ts(log_residuals[105:155]);
		ts_XGG <- ts(salesmodeldata$gtrends_global[105:155]);
		ts_XGL <- ts(salesmodeldata$gtrends_local[105:155]);

		# Notice how I manually create the lagged values to be used in the
Linear Model
		holdout_matrix <- na.omit(data.frame(ts.union(ts_Y, ts_XGG, ts_XGL,
ts_Y.l1 = lag(ts_Y,-1), ts_Y.l2 = lag(ts_Y,-2), ts_Y.l3 = lag(ts_Y,-3),
ts_XGG.l1 = lag(ts_XGG,-1), ts_XGG.l2 = lag(ts_XGG,-2), ts_XGG.l3 =
lag(ts_XGG,-3), ts_XGL.l1 = lag(ts_XGL,-1), ts_XGL.l2 = lag(ts_XGL,-2),
ts_XGL.l3 = lag(ts_XGL,-3), const=1, trend=0.0001514194  )));

		var.predict = predict(object=var_model, n.ahead=52, dumvar=holdout_matrix);

	## Assess accuracy
		calc_mape (holdout_matrix$ts_Y, var.predict, islog=T, print=T)

Some context:
For my Master's thesis I'm using R to test the predictive power of web
metrics (such as google trends data & pageviews) in sales forecasting. To
properly assess this, I employ a simple AR model (for time series without
the extra variables) and a VAR model for the predictions with the extra
variables. I also develop a random forest with, and without the buzz
variables and see if MAPE improves.

Many thanks in advance!