[R] Choosing the optimum lag order of ARIMA model

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Aug 31 10:38:04 CEST 2007


On Fri, 31 Aug 2007, Megh Dal wrote:

> Dear all R users,
>
>  I am really struggling to determine the most appropriate lag order of 
> ARIMA model. My understanding is that, as for MA [q] model the auto 
> correlation coeff vanishes after q lag, it says the MA order of a ARIMA 
> model, and for a AR[p] model partial autocorrelation vanishes after p 
> lags it helps to determine the AR lag. And most appropriate model 
> choosed by this argument gives min AIC.

The last part is fallacious.  Also, you are applying your rules to 
selecting the orders in ARMA models, and they apply only to pure MA or AR 
models.

The R test file src/library/stats/tests/ts-tests.R has an example of model 
selection by AIC.

>
>  Now I considered following data :
>
>  2.1948 2.2275 2.2669 2.2839 1.9481 2.1319 2.0238 2.3109 2.5727 2.5176
> 2.5728 2.6828 2.8221 2.879 2.8828 2.9955 2.9906 2.9861 3.0452 3.068
> 2.9569 3.0256 3.0977 2.985 2.9572 3.0877 3.1009 3.1149 2.8886 2.9631
> 3.0325 2.9175 2.7231 2.7905 2.8493 2.8208 2.8156 2.9115 2.701 2.6928
> 2.7881 2.723 2.7266 2.9494 3.113 3.0566 3.0358 3.05 3.0724 3.1365
> 3.1083 3.0257 3.2211 3.4269 3.327 3.1205 2.9997 3.0201 3.0803 3.2059
> 3.1997 3.038 3.1613 3.2802 3.2194
>
>  ACF for 1st diff series:
>  Autocorrelations of series 'diff(data1)', by lag
>       0      1      2      3      4      5      6      7      8      9     10
> 1.000 -0.022 -0.258 -0.016  0.066  0.034  0.035 -0.001 -0.089  0.028  0.222
>    11     12     13     14     15     16     17     18
> -0.132 -0.184 -0.038  0.048 -0.026 -0.041 -0.067  0.059
>
>    PACF for 1st diff series:
>  Partial autocorrelations of series 'diff(data1)', by lag
>       1      2      3      4      5      6      7      8      9     10     11
> -0.022 -0.258 -0.031 -0.002  0.026  0.057  0.021 -0.069  0.029  0.194 -0.124
>    12     13     14     15     16     17     18
> -0.100 -0.111 -0.043 -0.078 -0.056 -0.085  0.086
>
>  On basis of that I choose ARIMA[2,1,2] for the original data
>
>  But I got error while doing that :
>
>  > arima(data1, c(2,1,2))
> Error in arima(data1, c(2, 1, 2)) : non-stationary AR part from CSS
>
>  And AIC for other combination of lags are:
>  > arima(data1, c(2,1,1))$aic
> [1] -84.83648
>> arima(data1, c(1,1,2))$aic
> [1] -84.35737
>> arima(data1, c(1,1,1))$aic
> [1] -83.79392
>
>  Hence on basis of AIC criteria if I choose ARIMA[2,1,1] model, then the 
> first rule that I said earlier does not support.
>
>  Am I making anything wrong? Can anyone give me any suggestion on what 
> is the "universal" rule for choosing the best lag?
>
>  Regards,
>
>
>
>
>
>
>
>
> ---------------------------------
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list