[R] filling small gaps of N/A

Tue Apr 3 19:14:17 CEST 2012

Forgot to mention that the offsets were into the 'gaps' (result of the
rle) and 'offsets' which is the index into the original data there the
gap starts.

> gaps
Run Length Encoding
  lengths: int [1:5] 2 2 4 14 2
  values : logi [1:5] FALSE TRUE FALSE TRUE FALSE
> offsets
[1]  1  3  5  9 23
>

On Tue, Apr 3, 2012 at 1:12 PM, jim holtman <jholtman at gmail.com> wrote:
>> x <- read.table(text="09/01/2008 12:00      1.93
> + 09/01/2008 12:15      3.93
> + 09/01/2008 12:30       NA
> + 09/01/2008 12:45       NA
> + 09/01/2008 13:00      4.93
> + 09/01/2008 13:15      5.93
> + 09/01/2008 16:15                2.93
> + 09/01/2008 16:30                2.93
> + 09/01/2008 16:45                NA
> + 09/01/2008 17:00                NA
> + 09/01/2008 17:15                NA
> + 09/01/2008 17:30                NA
> + 09/01/2008 17:45                NA
> + 09/01/2008 18:00                NA
> + 09/01/2008 18:15                NA
> + 09/01/2008 18:30                NA
> + 09/01/2008 18:45                NA
> + 09/01/2008 19:00                NA
> + 09/01/2008 19:15                NA
> + 09/01/2008 19:30                NA
> + 09/01/2008 19:45                NA
> + 09/01/2008 20:00                NA
> + 09/01/2008 20:15                7.93
> + 09/01/2008 20:30                7.93", as.is = TRUE)
>>
>> # find the NA gaps and process differently
>> gaps <- rle(is.na(x$V3))
>> offsets <- c(1, cumsum(head(gaps$lengths, -1)) + 1)
>> (shortgaps <- which(gaps$values & (gaps$lengths <= 4)))
> [1] 2
>> (longgaps <- which(gaps$values & (gaps$lengths > 4)))
> [1] 4
>>
>> # now that you have the indices of where the short/long gaps are
>> # you can process each individually; left as an exercise to the reader
>>
>
>
> On Tue, Apr 3, 2012 at 10:13 AM, jeff6868
> <geoffrey_klein at etu.u-bourgogne.fr> wrote:
>> Michael,
>>
>> First of all, thank you very much for your answer.
>> I've read your 2 answers, but I'm not really sure that they corresponds to
>> my problem of NAs.
>> I'll try to detail you a bit more.
>>
>> This problem concerns the second part of my program. In the first part, I've
>> already created a timeseries object with the library (timeseries). I had to
>> delete first all the wrong values in my data and replace it with NAs.
>> So my data contains already missing data (NAs), as I have cleaned it before.
>>
>> The thing is that sometimes I have small gaps of missing data (only 2 or 3
>> following) like in "example 1" below:
>>
>> example 1:
>>
>> 09/01/2008 12:00      1.93
>> 09/01/2008 12:15      3.93
>> 09/01/2008 12:30       NA            So here you have a small gap with only
>> 2 NAs
>> 09/01/2008 12:45       NA
>> 09/01/2008 13:00      4.93
>> 09/01/2008 13:15      5.93
>>
>> But sometimes, always in the same file, I have big gaps, such as 10 or more
>> NAs following each other like in "example 2" below:
>>
>> example 2:
>>
>> 09/01/2008 16:15                2.93
>> 09/01/2008 16:30                2.93
>> 09/01/2008 16:45                NA
>> 09/01/2008 17:00                NA
>> 09/01/2008 17:15                NA
>> 09/01/2008 17:30                NA
>> 09/01/2008 17:45                NA
>> 09/01/2008 18:00                NA          So here you have a big gap with more than 10
>> NAs following each other
>> 09/01/2008 18:15                NA
>> 09/01/2008 18:30                NA
>> 09/01/2008 18:45                NA
>> 09/01/2008 19:00                NA
>> 09/01/2008 19:15                NA
>> 09/01/2008 19:30                NA
>> 09/01/2008 19:45                NA
>> 09/01/2008 20:00                NA
>> 09/01/2008 20:15                7.93
>> 09/01/2008 20:30                7.93
>>
>> So in the whole same file, I can have sometimes big gaps (2 or 3 NAs),
>> sometimes big or very big gaps (10 or 100 NAs following).
>>
>> The aim of my problem is to apply the function: na.approx(x) of the library
>> (zoo) to fill NAs ONLY for small gaps.
>>
>> If I just do: apply(na.approx(x)), it will fill all the NAs of my data (big
>> gaps + small gaps). It's exactly what I DON'T WANT.
>>
>> My problem is to say to R: " you apply the function (na.approx) to fill NAs
>> ONLY if you see 4 NAs maximum following each other (small gaps) (like
>> example 1)". "If you see more than 4 NAs following each other (big gaps like
>> in example 2), you keep these NAs and you DON'T fill this big gap".
>>
>> My question is: how can I say this to R? I don't know how to do it.
>> Hope I've been understandable this time ^^
>> Thanks a lot again for all your answers!
>>
>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/filling-small-gaps-of-N-A-tp4528184p4528907.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.

-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.