# [R] help with ARIMA and predict

Brian Scholl brianscholl1973 at yahoo.com
Sat Jul 9 01:06:56 CEST 2005

```I'm trying to do the following out of sample
regression with autoregressive terms and additional x
variables:

y(t+1)=const+B(L)*y(t)+C(1)*x_1(t)...+C(K)*x_K(t)

where:
B(L) = lag polynom. for AR terms
C(1..K) = are the coeffs. on K exogenous variables
that have only 1 lag

Question 1:
-----------

Suppose I use arima to fit the model:

df.y<-arima(yvec,order=c(L,0,0),xreg=xmat[,(1:K)],n.cond=maximum.lag)

Now suppose I want to do a 1-period ahead prediction
based on the results of this regression, using
predict:

I'm expecting newx to be 1X3.  After all, I just want
to predict 1 value of y, so in my mind I should just
need 1 time period's observation of x (i.e. #
rows=n.ahead). I'm sort of expecting predict to grab
the last two values of yvec to use as y(t),y(t-1) in
prediction.  If I make such a pass, I get:

Error in predict.Arima(df.y, newxreg = newx) :
'xreg' and 'newxreg' have different numbers of columns

If I try passing 2+ rows of x, predict accepts the
call and I get:

Time Series:
Start = 41
End = 42
Frequency = 1
 -0.03165 -0.03165 (for simplicity I passed two
identical rows of x)

\$se
Time Series:
Start = 41
End = 41
Frequency = 1
 0.02707

So I'm puzzled as to what I'm doing wrong.  When I
have n.ahead rows in newxreg, I get an error, but by
passing a second row in it is accepted. But what am I
predicting in the latter case? Is R requiring another
row so that it can form a prediction of y(t) to use in
forecasting y(t+1) (this is not what I want to do), or
have I simply goofed in some other way?

Is there a better way to do this? I've also attempted
something similar using lm, but I'm unclear how to
interpret the "predicted" time series it returns.
The obvious alternative is to construct the forecast
using df.y\$coef and a relevant data vector.

Q2:
---

Suppose I want to select the autoregressive order
using AIC.  If I have understood, in the excellent
MASS text comments (p415) that comparisons are only
valid if n.cond is the same for each model.  Yet, when
I set n.cond=maximum.lag (say =5), I get df.y\$n.cond
=0.  So I'm unclear if the AICs are comparable for
different models (i.e. different L's and different
K's).

```