[R] Missing data?

Kevin Burton rkevinburton at charter.net
Sun Nov 27 22:08:55 CET 2011

```I admit it isnt reality but I was hoping through judicious use of these functions I could approximate reality. For example in the years where there are more than 53 weeks in a year I would be happy if there were a way to recognize this and drop the last week of data. If there were less than 53 I would "pad" the year with an extra dummy week. This is just about the same as your suggestion of putting more than 7 days in the first and last weeks. But i still need this kind of date manipulation to even know how many days to add in to make the approximation viable. This kind of best approximation to reality seems better than to settle for the resolution of a month just because it is consistent. Daily would be too much data and even then there would be an approximation due to leap years.

On Nov 26, 2011, at 3:13 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:

> On Tue, Nov 22, 2011 at 6:50 PM, Kevin Burton <rkevinburton at charter.net> wrote:
>> Void of any other suggestions this approach makes sense but for my case I
>> think I need to use zoo objects rather than xts. If I sequence the data
>> generally I don't know if there will be 365 days in the year or 366. So I
>> have to sequence the dates as:
>>
>> seq(from=as.Date("2011-01-01"), to=as.Date("2011-12-31"), by="day")
>>
>> If I use this sequence with xts I get:
>>
>>> ds <- xts(NA, seq(from=as.Date("2011-01-01"), to=as.Date("2011-12-31"),
>> by="day"))
>> Error in xts(NA, seq(from = as.Date("2011-01-01"), to =
>> as.Date("2011-12-31"),  :
>>  NROW(x) must match length(order.by)
>>
>> If I leave the 'data' empty I don't get the error but if I try to assign an
>> individual item (fill as appropriate)
>>
>>> ds <- xts(, seq(from=as.Date("2011-01-01"), to=as.Date("2011-12-31"),
>> by="day"))
>>> ds["2011-12-24"] <- 10
>>> ds
>> Error in structure(coredata(x), names = x.attr\$dimnames[[1]]) :
>>  'names' attribute [365] must be the same length as the vector [358]
>>
>> So now I need to remember that I have not filled in all of the data. Also
>> simple dereferencing gives:
>>
>>> ds[1]
>> Error in `[.xts`(ds, 1) : subscript out of bounds
>>
>> With zoo I am able to create a time-series where all of the data is
>> initially NA:
>>
>>> ds <- zoo(NA, seq(from=as.Date("2011-01-01"), to=as.Date("2011-12-31"),
>> by="day"))
>>
>> So I can fill the data as appropriate and the remaining slots will have NA.
>> I may be new with xts but I cannot see a way of creating a useable 'blank'
>> time-series.
>>
>> Also with xts it seems like the frequency is ignored.
>>
>>> ds <- xts(1:365, seq(from=as.Date("2011-01-01"), to=as.Date("2011-12-31"),
>> by="day"), frequency=52)
>>> frequency(ds)
>> [1] 1
>>
>> Whereas zoo remembers the frequency setting
>>
>>> ds <- zoo(1:365, seq(from=as.Date("2011-01-01"), to=as.Date("2011-12-31"),
>> by="day"), frequency=52)
>>> frequency(ds)
>> [1] 52
>>
>> But since the ultimate goal is to get the time-series in a 'ts' format (as
>> many functions require 'ts') it seems like even zoo has problems:
>
> The problem is that you seem to want a fixed number of periods per
> year but there is not a constant of 52 weeks nor 365 days in a year.
> You are going to have give up something since your apparent criteria
> conflict with reality.  For example, you could use months in which
> case there are exactly 12 or you could stick more than 7 days into the
> first or last week of the year so that there are exactly 52 weeks in a
> year but they don't all have the same number of days, etc.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com

```