[R] Testing predictive power of ARIMA model

Gerard M. Keogh GMKeogh at justice.ie
Mon Dec 15 12:01:59 CET 2008


Sorry,

but this gives me the shivers!

Are all your time series linear?
For each model you should check the residuals and their squares to see if
they are uncorrelated (Box-ljung Chi-sq).
Another useful check is to test for a trend in the coefficient of variation
of the residuals.
If the series is linear then AIC (or preferably BIC) is an excellent
measure of predictive performance.

If these tests fail your auto-model has not worked - in fact if auto-model
has an ar value of 10 then you can bet this is not a linear series. For
most series the polynomials have max ar=3 and max ma=3. So using ar=10
isn't really a good idea - you're way overfitting - this is only masking
something else such as long memory.

Cross-validation issues

   For series with a nonlinear aspect, arbitartily splitting it up at
   whatever point you feel like is not such a good idea as it interferes
   with the dynamics. If your series is linear then the proposal isn't too
   bad because any induced discontinuity will be damped out by the ar
   process in a finite number of steps.

   There are 2 types of predictive power here. The first is based on a
   fixed point (usually the last) y(t_n) - this is the conditional
   forecast. The second is the predictive power at any point - this is the
   contidional forecast marginalised across all points. For a linear series
   (a la Box-Jenkins) the variance of each is the same. For nonlinear
   series this is not the case.
   You need to decide which is required and construct your sub-samples
   accordingly.
   The suggested cross-validation is a type of marginal forecast.

   To make a reasonable effort at the correct approach you should look up
   block boopstrap methods for time series. The key to these is they pick
   blocks that match the thing you're trying to measure. If, for example,
   you were computing the autocorrelation y_t vs. y_(t-1) then the blocks
   are made of pairs of adjacent values. For seasonal data you must block
   to ensure seasons are maintained.

   Finally, your Australian colleague Rob Hyndman is a good source for
   bootstrapping time series - his website may have details of work he did
   on electricity demand which you might find useful.

Gerard



                                                                           
             Gad Abraham                                                   
             <gabraham at csse.un                                             
             imelb.edu.au>                                              To 
             Sent by:                  Evan DeCorte <evandec at gwu.edu>      
             r-help-bounces at r-                                          cc 
             project.org               r-help at r-project.org                
                                                                   Subject 
                                       Re: [R] Testing predictive power of 
             14/12/2008 01:50          ARIMA model                         
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Evan DeCorte wrote:
> Thanks for the great feedback. Conceptually I understand how you would go
about testing out of sample performance. It seems like accuracy() would be
the best way to test out of forecast performance and will help to automate
the construction of statistics I would have calculated on my own.
>
> However, the real question now is how do you loop through a time series
and automatically split a time series into training and testing sets. I
know how I would do it for individual sets but to do so manually over a
large number of time series seems excessively burdensome.

You don't have to do it manually. For example, if you want to do 10-fold
cross-validation, and you have a time series of length n, then split it
into n/10 blocks, e.g. using the index i <- rep(1:10, each=n/10)
(assuming n is divisible by 10), loop 10 times using 9 blocks as
training and 1 block as test (different test block each time) and
measure the MSE for each repetition. Repeat this for all your time series.


--
Gad Abraham
Dept. CSSE and NICTA
The University of Melbourne
Parkville 3010, Victoria, Australia
email: gabraham at csse.unimelb.edu.au
web: http://www.csse.unimelb.edu.au/~gabraham

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



**********************************************************************************
The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.  It is the policy of the Department of Justice, Equality and Law Reform and the Agencies and Offices using its IT services to disallow the sending of offensive material.
Should you consider that the material contained in this message is offensive you should contact the sender immediately and also mailminder[at]justice.ie.

Is le haghaidh an duine nó an eintitis ar a bhfuil sí dírithe, agus le haghaidh an duine nó an eintitis sin amháin, a bheartaítear an fhaisnéis a tarchuireadh agus féadfaidh sé go bhfuil ábhar faoi rún agus/nó faoi phribhléid inti. Toirmisctear aon athbhreithniú, atarchur nó leathadh a dhéanamh ar an bhfaisnéis seo, aon úsáid eile a bhaint aisti nó aon ghníomh a dhéanamh ar a hiontaoibh, ag daoine nó ag eintitis seachas an faighteoir beartaithe. Má fuair tú é seo trí dhearmad, téigh i dteagmháil leis an seoltóir, le do thoil, agus scrios an t-ábhar as aon ríomhaire. Is é beartas na Roinne Dlí agus Cirt, Comhionannais agus Athchóirithe Dlí, agus na nOifígí agus na nGníomhaireachtaí a úsáideann seirbhísí TF na Roinne, seoladh ábhair cholúil a dhícheadú.
Más rud é go measann tú gur ábhar colúil atá san ábhar atá sa teachtaireacht seo is ceart duit dul i dteagmháil leis an seoltóir láithreach agus le mailminder[ag]justice.ie chomh maith. 
***********************************************************************************





More information about the R-help mailing list