[R] Generation of missiing values in a time serie...

Gabor Grothendieck ggrothendieck at gmail.com
Tue Dec 13 21:06:24 CET 2005


In thinking about this some more, the trick I discussed is
probably not the best way to do it since its possible that
in the future zoo will completely disallow illegal zoo objects.
I think a better way might be to construct it like this:


aggregate(zoo(z.data), round(z.time, 1), tail, 1)

where z.data is the matrix and z.time are the times.  The variable
z, which is an illegal zoo object, would not be created but in terms
of z, since that is what I have reproducibly from your post, we have:

z.data <- coredata(z)
z.time <- time(z)


On 12/13/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Yes, this is the definition of a time series and therefore of a zoo object.
> A time series is a mathematical function, i.e. it assigns a single element
> of the range to each element of the domain. This data does not describe
> a time series.
>
> Also it has no underlying regularity as the warning message states.
> To use as.ts one wants a series with an underlying regularity that has
> gaps and then as.ts will fill in the gaps with NAs.
>
> If we don't have an underlying regularity the question is not well posed
> but its likely we want to discretize time.  The  zoo command itself is
> somewhat forgiving, at least in this case, i.e. it allows one to specify
> this illegal zoo object with non-unique times for purposes of discretization;
> however, such a zoo object should not be used other than to get a legal
> zoo object out.
>
> For example, in the following we round the times to one decimal place
> and then within each set of values at the same discretized time take the
> last one.  (Alternately specify mean instead of tail, 1 if the average
> is prefered.)  Then we convert that to a ts object:
>
> > as.ts(aggregate(z, round(time(z), 1), tail, 1))
> Time Series:
> Start = c(123, 2)
> End = c(123, 8)
> Frequency = 10
>          time flow seq       ts     x      rtt size
> 123.1 123.1257    0 967 123.1257 13394 0.798205 1472
> 123.2 123.2411    0 969 123.2411 12680 0.796258 1472
> 123.3       NA   NA  NA       NA    NA       NA   NA
> 123.4       NA   NA  NA       NA    NA       NA   NA
> 123.5 123.4726    0 970 123.4726 12680 0.796258 1472
> 123.6 123.5886    0 971 123.5886 12680 0.796258 1472
> 123.7 123.7046    0 972 123.7046 12680 0.796258 1472
>
> On 12/13/05, Alvaro Saurin <saurin at dcs.gla.ac.uk> wrote:
> >
> > I think I have found the error. It appears when there are two entries
> > with the same time. Using as input file:
> >
> > --------- CUT --------
> > # Output format for PCKs:
> > # TIME FLOW P [+-] SEQ TS X RTT SIZE
> > #
> > 123.125683 0 P + 967 123.125683 13394 0.798205 1472
> > 123.241137 0 P + 968 123.241137 12680 0.796258 1472
> > 123.241137 0 P + 969 123.241137 12680 0.796258 1472
> > 123.472631 0 P + 970 123.472631 12680 0.796258 1472
> > 123.588613 0 P + 971 123.588613 12680 0.796258 1472
> > 123.704594 0 P + 972 123.704594 12680 0.796258 1472
> > --------- CUT --------
> >
> > I run fhe following code:
> >
> > --------- CUT --------
> > h_types <- list (0, 0, NULL, NULL, 0, 0, 0, 0, 0)
> > h_names <- list ("time", "flow",  "seq", "ts", "x", "rtt", "size")
> >
> > pcks_file    <- pipe ("grep ' P ' data", "r")
> > pcks          <- scan (pcks_file, what = h_types, comment.char = '#',
> > fill = TRUE)
> > mat_df      <- data.frame (pcks[1:2], pcks[5:9])
> > mat           <- as.matrix (mat_df)
> > colnames (mat)      <- h_names
> > z <- zoo (mat, mat [,"time"])
> > --------- CUT --------
> >
> > The dput of 'z' shows:
> >
> > --------- CUT --------
> > structure(c(123.125683, 123.241137, 123.241137, 123.472631, 123.588613,
> > 123.704594, 0, 0, 0, 0, 0, 0, 967, 968, 969, 970, 971, 972, 123.125683,
> > 123.241137, 123.241137, 123.472631, 123.588613, 123.704594, 13394,
> > 12680, 12680, 12680, 12680, 12680, 0.798205, 0.796258, 0.796258,
> > 0.796258, 0.796258, 0.796258, 1472, 1472, 1472, 1472, 1472, 1472
> > ), .Dim = c(6, 7), .Dimnames = list(c("1", "2", "3", "4", "5",
> > "6"), c("time", "flow", "seq", "ts", "x", "rtt", "size")), index =
> > structure(c(123.125683,
> > 123.241137, 123.241137, 123.472631, 123.588613, 123.704594), .Names =
> > c("1",
> > "2", "3", "4", "5", "6")), class = "zoo")
> > --------- CUT --------
> >
> > If I try a 'as.ts(z)', it fails. If I remove the duplicate entry, I
> > can convert it to a TS with no problem. Is this made intentionally?
> > Because then I have to filter the input matrix... But, anyway, the
> > output matrix, after filtering, doesn't seem regular:
> >
> > --------- CUT --------
> >  > as.ts (z)
> > Time Series:
> > Start = 1
> > End = 5
> > Frequency = 1
> >       time flow seq       ts     x      rtt size
> > 1 123.1257    0 967 123.1257 13394 0.798205 1472
> > 2 123.2411    0 969 123.2411 12680 0.796258 1472
> > 3 123.4726    0 970 123.4726 12680 0.796258 1472
> > 4 123.5886    0 971 123.5886 12680 0.796258 1472
> > 5 123.7046    0 972 123.7046 12680 0.796258 1472
> > Warning message:
> > 'x' does not have an underlying regularity in: as.ts.zoo(z)
> > --------- CUT --------
> >
> > Weird...
> >
> >
> > On 13 Dec 2005, at 16:33, Gabor Grothendieck wrote:
> >
> > > Please provide a reproducible example.  Note that dput(x) will output
> > > an R object in a way that can be copied and pasted into another
> > > session.
> > >
> > > On 12/13/05, Alvaro Saurin <saurin at dcs.gla.ac.uk> wrote:
> > >>
> > >> On 13 Dec 2005, at 13:08, Gabor Grothendieck wrote:
> > >>
> > >>> Your variable mat is not a matrix; its a data frame.  Check it with:
> > >>>
> > >>>    class(mat)
> > >>>
> > >>> Here is an example:
> > >>>
> > >>> x <- cbind(A = 1:4, B = 5:8)
> > >>> tt <- c(1, 3:4, 6)
> > >>>
> > >>> library(zoo)
> > >>> x.zoo <- zoo(x, tt)
> > >>> x.ts <- as.ts(x.zoo)
> > >>
> > >> Fixed, but anyway it fails:
> > >>
> > >>>      h_types <- list (0, 0, NULL, NULL, 0, 0, 0, 0, 0)
> > >>>      h_names <- list ("time", "flow", "seq", "ts", "x", "rtt",
> > >>> "size")
> > >>
> > >>>      pcks_file       <- pipe ("grep ' P ' server.dat", "r")
> > >>>      pcks            <- scan (pcks_file, what = h_types,
> > >>                                        comment.char = '#', fill =
> > >> TRUE)
> > >>
> > >>>      mat_df                  <- data.frame (pcks[1:2], pcks[5:9])
> > >>>      mat                             <- as.matrix (mat_df)
> > >>>      colnames (mat)  <- h_names
> > >>
> > >>>      class (mat)
> > >> [1] "matrix"
> > >>
> > >>>      z <- zoo (mat, mat [,"time"])
> > >>
> > >>>      z
> > >>>      z
> > >>          time         flow         seq          ts
> > >> x            rtt          size
> > >> 1.0009       1.000893     0.000000     0.000000     1.000893
> > >> 1472.000000     0.000000  1472.000000
> > >> 1.5145       1.514454     0.000000     1.000000     1.514454
> > >> 2944.000000     0.513142  1472.000000
> > >> 2.0151       2.015093     0.000000     2.000000     2.015093
> > >> 2944.000000     0.513142  1472.000000
> > >> 2.515        2.515025     0.000000     3.000000     2.515025
> > >> 4806.000000     0.504488  1472.000000
> > >> 2.822        2.821976     0.000000     4.000000     2.821976
> > >> 5730.000000     0.496728  1472.000000
> > >> [...]
> > >>
> > >>>      as.ts (z)
> > >> Error in if (del == 0 && to == 0) return(to) :
> > >>        missing value where TRUE/FALSE needed
> > >>
> > >> Any idea? Thanks for your help.
> > >>
> > >> Alvaro
> > >>
> > >>
> > >> --
> > >> Alvaro Saurin <alvaro.saurin at gmail.com> <saurin at dcs.gla.ac.uk>
> > >>
> > >>
> > >>
> > >>
> >
> > --
> > Alvaro Saurin <alvaro.saurin at gmail.com> <saurin at dcs.gla.ac.uk>
> >
> >
> >
> >
>




More information about the R-help mailing list