[R] Up- or downsampling time series in R

Brandt, T. (Tobias) TobiasBr at Taquanta.com
Thu Oct 26 17:46:12 CEST 2006


Hi
 
I have data that is sampled (in time) with a certain frequency and I would
like to express this time series as a time series of a higher (or lower)
frequency with the newly added time points being filled in with NA, 0, or
perhaps interpolated. My data might be regularly or irregularly spaced. For
example, I might have quarterly data that I would like to handle as a
monthly time series with NAs filled in for the missing months.
 
RSiteSearch("upsample") gave one link to a function in the "waveslim"
package that I'm not familiar with. It seems to me that this would be a
fairly common time series task and thus am hoping to find something in the
more common time series packages/classes such as ts, zoo, tseries, etc...
 
I will now give some example code.
 
If I am "lucky" enough that my data is irregularly spaced, then a
combination of zoo and ts already accomplishes this task.
 
> require(zoo)
[1] TRUE
> dt <- sample(c(1,3,9), 20, replace=TRUE)
> t <- zoo(dt, as.yearmon(Sys.Date()) + cumsum(dt)/12)
> t
Jan 2007 Feb 2007 Nov 2007 Feb 2008 Nov 2008 Dec 2008 Mar 2009 Apr 2009 Jul
2009 Aug 2009 
       3        1        9        3        9        1        3        1
3        1 
Nov 2009 Feb 2010 Nov 2010 Aug 2011 May 2012 Jun 2012 Jul 2012 Oct 2012 Jul
2013 Aug 2013 
       3        3        9        9        9        1        1        3
9        1 
> as.ts(t)
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2007   3   1  NA  NA  NA  NA  NA  NA  NA  NA   9  NA
2008  NA   3  NA  NA  NA  NA  NA  NA  NA  NA   9   1
2009  NA  NA   3   1  NA  NA   3   1  NA  NA   3  NA
2010  NA   3  NA  NA  NA  NA  NA  NA  NA  NA   9  NA
2011  NA  NA  NA  NA  NA  NA  NA   9  NA  NA  NA  NA
2012  NA  NA  NA  NA   9   1   1  NA  NA   3  NA  NA
2013  NA  NA  NA  NA  NA  NA   9   1                
> plot(t)

 
However if the data happens to be regularly spaced, upsampling it isn't
quite as straightforward.
 
> t2 <- zoo(sample(1:3, 20, replace=TRUE), as.yearmon(seq(2000, by=0.5,
length=20)))
> t2
Jan 2000 Jul 2000 Jan 2001 Jul 2001 Jan 2002 Jul 2002 Jan 2003 Jul 2003 Jan
2004 Jul 2004 
       3        3        2        2        1        3        1        2
3        3 
Jan 2005 Jul 2005 Jan 2006 Jul 2006 Jan 2007 Jul 2007 Jan 2008 Jul 2008 Jan
2009 Jul 2009 
       2        2        3        3        2        3        3        2
1        3 
> (t2.ts <- as.ts(t2))
Time Series:
Start = c(2000, 1) 
End = c(2009, 2) 
Frequency = 2 
 [1] 3 3 2 2 1 3 1 2 3 3 2 2 3 3 2 3 3 2 1 3
> plot(t2)
>
 
I would expect this to be as simple as changing the frequency attribute of
t2.ts to 12 but I didn't seem to be able to find out how to do this or if it
is possible.
 
So far, the only way around this that I have found is doing it "manually" in
the following way:
 
> t2.monthly <- zoo(NA, as.yearmon(seq(from=2000, to=2009.5, by=1/12)))
> window(t2.monthly, as.numeric(time(t2)) ) <- as.numeric(t2)            #
can this be done using "[]" indexing?
> t2.monthly
Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000 Jul 2000 Aug 2000 Sep
2000 Oct 2000 
       3       NA       NA       NA       NA       NA        3       NA
NA       NA 
Nov 2000 Dec 2000 Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Jul
2001 Aug 2001 
      NA       NA        2       NA       NA       NA       NA       NA
2       NA 
Sep 2001 Oct 2001 Nov 2001 Dec 2001 Jan 2002 Feb 2002 Mar 2002 Apr 2002 May
2002 Jun 2002 
      NA       NA       NA       NA        1       NA       NA       NA
NA       NA 
Jul 2002 Aug 2002 Sep 2002 Oct 2002 Nov 2002 Dec 2002 Jan 2003 Feb 2003 Mar
2003 Apr 2003 
       3       NA       NA       NA       NA       NA        1       NA
NA       NA 
May 2003 Jun 2003 Jul 2003 Aug 2003 Sep 2003 Oct 2003 Nov 2003 Dec 2003 Jan
2004 Feb 2004 
      NA       NA        2       NA       NA       NA       NA       NA
3       NA 
Mar 2004 Apr 2004 May 2004 Jun 2004 Jul 2004 Aug 2004 Sep 2004 Oct 2004 Nov
2004 Dec 2004 
      NA       NA       NA       NA        3       NA       NA       NA
NA       NA 
Jan 2005 Feb 2005 Mar 2005 Apr 2005 May 2005 Jun 2005 Jul 2005 Aug 2005 Sep
2005 Oct 2005 
       2       NA       NA       NA       NA       NA        2       NA
NA       NA 
Nov 2005 Dec 2005 Jan 2006 Feb 2006 Mar 2006 Apr 2006 May 2006 Jun 2006 Jul
2006 Aug 2006 
      NA       NA        3       NA       NA       NA       NA       NA
3       NA 
Sep 2006 Oct 2006 Nov 2006 Dec 2006 Jan 2007 Feb 2007 Mar 2007 Apr 2007 May
2007 Jun 2007 
      NA       NA       NA       NA        2       NA       NA       NA
NA       NA 
Jul 2007 Aug 2007 Sep 2007 Oct 2007 Nov 2007 Dec 2007 Jan 2008 Feb 2008 Mar
2008 Apr 2008 
       3       NA       NA       NA       NA       NA        3       NA
NA       NA 
May 2008 Jun 2008 Jul 2008 Aug 2008 Sep 2008 Oct 2008 Nov 2008 Dec 2008 Jan
2009 Feb 2009 
      NA       NA        2       NA       NA       NA       NA       NA
1       NA 
Mar 2009 Apr 2009 May 2009 Jun 2009 Jul 2009 
      NA       NA       NA       NA        3 
> points(t2.monthly, type="p", col="blue")
> lines(na.locf(t2.monthly), col="blue")        # as an example of why I
might want to do this.
> 
 
Similarly, it would be nice if one could conveniently downsample a time
series, choosing to keep only the Nth point, or the sum or the average of
the previous N points, etc... I can see how that particular application
could probably be accomplished relatively easily using rapply and a
subsetting operation.  However it might be nice to have a convenient wrapper
for this.
 
Any help would be appreciated.  Thanks in advance.
 
Tobias

********************
Nedbank Limited Reg No 1951/000009/06. The following link displays the names of the Nedbank Board of Directors and Company Secretary. [ http://www.nedbank.co.za/terms/DirectorsNedbank.htm ]
This email is confidential and is intended for the addressee only. The following link will take you to Nedbank's legal notice. [ http://www.nedbank.co.za/terms/EmailDisclaimer.htm ]



More information about the R-help mailing list