[R] Functional data anlysis for unequal length and unequal width time series

Jeff Newmiller jdnewmil @ending from dcn@d@vi@@c@@u@
Tue Dec 18 07:53:14 CET 2018


You will learn something useful if you search for "rolling join". The zoo package can handle this, as can the data.table package (read the vignette).

Your decision to pad with NA at the end was ill-considered... the first point of your first series is between the first two points of your second series... you need to interleave the points somehow.

You will need to decide whether you want to use piecewise linear approximation (as with the base "approx" function) or the more stable last-observation-carried-forward ("locf") or cubic splines or something more exotic like Fourier interpolation to identify the new interpolated "y" values in each series.

You can avoid the rolling join if you intend to resample the series to have points at regular intervals.  Just apply your preferred interpolation technique with your intended mesh of regular time values to each of your series in turn and then use cbind with the results.

I don't know anything about the package you mention, but getting time series data aligned is a common preprocessing step for many time series analysis.

Oh, and to you should probably be familiar with that CRAN Time Series Task View [1].

PS you should provide a link back to your original posting when moving the conversation to a different venue in case the discussion doesn't stay dead there.

[1] https://cran.r-project.org/web/views/TimeSeries.html

On December 17, 2018 8:50:09 AM PST, soura using iastate.edu wrote:
>Dear All,
>            I apologize if you have already seen in Stack Overflow. I
>have not got any response from there so I am posting for help here.
>
>I have data on 1318 time series. Many of these series are of unequal
>length. Apart from this also quite a few time points for each of the
>series are observed at different time points. For example consider the
>following four series
>
>t1 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
>V1 <- c(-0.1710, -0.0824, -0.0419, -0.0416, -0.0216, -0.0792, -0.0656,-
>0.0273, -0.0589)
>ser1 <- cbind(t1, V1)
>
>t2 <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
>V2 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231, 0.2264)
>ser2 <- cbind(t2, V2)
>
>t3 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
>25.88, 25.97, 25.99)
>V3 <- c(0.0897, -0.0533, -0.3497, -0.5684, -0.4294, -0.1109, 0.0352,
>0.0550, -0.0536, 0.0185, -0.0295, -0.0324)
>ser3 <- cbind(t3, V3)
>
>t4 <- c(24.5, 24.67, 24.71, 24.98, 25.17)
>V4 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231)
>ser4 <- cbind(t4, V4)
>
>Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
>observations made at over those time points. The time points in the
>actual data are Julian dates so they look like these, just that they
>are much larger decimal figures like 2452450.6225.
>
>I am trying to cluster these time series using functional data approach
>for which I am using the "funFEM" package in R. Th examples present are
>for equispaced and equal length time series so I am not sure how to use
>the package for my data. Initially I tried by making all the time
>series equal in length to the time series having the highest number of
>observations (here equal to ser3) by adding NA's to the time series. So
>following this example I made ser2 as
>
>t2_n <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
>25.88, 25.97, 25.99)
>V2_na <- c(V2, rep(NA, 6))
>ser2_na <- cbind(t2_n, V2_na)
>
>Note that to make t2 equal to length of t3 I grabbed the last 6 time
>points from t3. To make V2 equal in length to V3 I added NA's.
>
>Then I created my data matrix as
>
>dat <- rbind(V1_na, V2_na, V3, V4_na).
>
>The code I used was
>
>require(funFEM)
>basis<- create.fourier.basis(c(min(t3), max(t3)), nbasis = 25) 
>fdobj <- smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd
>
>Note that the range is constructed using the maximum and minumum time
>point of ser_3 series.
>
>res <- funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
>"random") 
>
>But this gives me an error
>
>Error in svd(X) : infinite or missing values in 'x'.
>
>Can anyone tell please help me on how to deal with this dataset for
>this package or any alternative package?
>
>Sincerly,
>Souradeep
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list