[R] Data frame reordering to time series

Gabor Grothendieck ggrothendieck at gmail.com
Sun Aug 8 02:04:32 CEST 2010


On Sat, Aug 7, 2010 at 4:49 PM, steven mosher <moshersteven at gmail.com> wrote:
> Given a data frame, or it could be a matrix if I choose to.
> The data consists of an ID, a year, and data for all 12 months.
> Missing values are a factor AND missing years.
>
> Id<-c(rep(67543,4),rep(12345,3),rep(89765,5))
>  Years<-c(seq(1989,1992,by =1),1991,1993,1994,seq(1991,1995,by=1))
>  Values2<-c(12,NA,34,21,NA,65,23,NA,13,NA,13,14)
>  Values<-c(12,14,34,21,54,65,23,12,13,13,13,14)
>  Data<-data.frame(Index=Id,Year=Years,Jan=Values,Feb=Values/2,Mar=Values2,Apr=Values2,Jun=Values,July=Values/3,Aug=Values2,Sep=Values,
> + Oct=Values,Nov=Values,Dec=Values2)
>  Data
>   Index Year Jan  Feb Mar Apr Jun      July Aug Sep Oct Nov Dec
> 1  67543 1989  12  6.0  12  12  12  4.000000  12  12  12  12  12
> 2  67543 1990  14  7.0  NA  NA  14  4.666667  NA  14  14  14  NA
> 3  67543 1991  34 17.0  34  34  34 11.333333  34  34  34  34  34
> 4  67543 1992  21 10.5  21  21  21  7.000000  21  21  21  21  21
> 5  12345 1991  54 27.0  NA  NA  54 18.000000  NA  54  54  54  NA
> 6  12345 1993  65 32.5  65  65  65 21.666667  65  65  65  65  65
> 7  12345 1994  23 11.5  23  23  23  7.666667  23  23  23  23  23
> 8  89765 1991  12  6.0  NA  NA  12  4.000000  NA  12  12  12  NA
> 9  89765 1992  13  6.5  13  13  13  4.333333  13  13  13  13  13
> 10 89765 1993  13  6.5  NA  NA  13  4.333333  NA  13  13  13  NA
> 11 89765 1994  13  6.5  13  13  13  4.333333  13  13  13  13  13
> 12 89765 1995  14  7.0  14  14  14  4.666667  14  14  14  14  14
>
>
> The Goal is to return a Time series object for each ID. Alternatively one
> could return a matrix that I can turn into a Time series.
> The final structure would be something like this ( done in matrix form for
> illustration)
>          1989.0  1989.083
>    1991 ......1992....1993..... 1994 .... 1995
> 67543 12       6.0   12  12  12  4.000000  12  12  12  12  12...
> .34...........21..     NA.........NA........NA
> 12345  NA, NA,
> NA,.............................................................54 27
>
> Basically the time series will have patches at the front, middle and end
> where you may have years of NA
> The must be column ordered by time and aligned so that averages for all
> series can be computed per month.
>
> Now I have looping code to do this, where I loop through all the IDs and map
> the row of data into the correct
> column. and create column names based on the data and row names based on the
> ID, but it's painfully
> slow. Any wizardry would help.

Your email came out a bit garbled so its not clear what you want to
get out but this code will produce a multivariate ts series, i.e. an
mts series, with one column for each series:

f <- function(x) ts(c(t(x[-(1:2)])), freq = 12, start = x$Year[1])
do.call(cbind, by(Data, Data$Index, f))



More information about the R-help mailing list