[R] filling small gaps of N/A

jim holtman jholtman at gmail.com
Tue Apr 3 19:14:17 CEST 2012


Forgot to mention that the offsets were into the 'gaps' (result of the
rle) and 'offsets' which is the index into the original data there the
gap starts.

> gaps
Run Length Encoding
  lengths: int [1:5] 2 2 4 14 2
  values : logi [1:5] FALSE TRUE FALSE TRUE FALSE
> offsets
[1]  1  3  5  9 23
>


On Tue, Apr 3, 2012 at 1:12 PM, jim holtman <jholtman at gmail.com> wrote:
>> x <- read.table(text="09/01/2008 12:00      1.93
> + 09/01/2008 12:15      3.93
> + 09/01/2008 12:30       NA
> + 09/01/2008 12:45       NA
> + 09/01/2008 13:00      4.93
> + 09/01/2008 13:15      5.93
> + 09/01/2008 16:15                2.93
> + 09/01/2008 16:30                2.93
> + 09/01/2008 16:45                NA
> + 09/01/2008 17:00                NA
> + 09/01/2008 17:15                NA
> + 09/01/2008 17:30                NA
> + 09/01/2008 17:45                NA
> + 09/01/2008 18:00                NA
> + 09/01/2008 18:15                NA
> + 09/01/2008 18:30                NA
> + 09/01/2008 18:45                NA
> + 09/01/2008 19:00                NA
> + 09/01/2008 19:15                NA
> + 09/01/2008 19:30                NA
> + 09/01/2008 19:45                NA
> + 09/01/2008 20:00                NA
> + 09/01/2008 20:15                7.93
> + 09/01/2008 20:30                7.93", as.is = TRUE)
>>
>> # find the NA gaps and process differently
>> gaps <- rle(is.na(x$V3))
>> offsets <- c(1, cumsum(head(gaps$lengths, -1)) + 1)
>> (shortgaps <- which(gaps$values & (gaps$lengths <= 4)))
> [1] 2
>> (longgaps <- which(gaps$values & (gaps$lengths > 4)))
> [1] 4
>>
>> # now that you have the indices of where the short/long gaps are
>> # you can process each individually; left as an exercise to the reader
>>
>
>
> On Tue, Apr 3, 2012 at 10:13 AM, jeff6868
> <geoffrey_klein at etu.u-bourgogne.fr> wrote:
>> Michael,
>>
>> First of all, thank you very much for your answer.
>> I've read your 2 answers, but I'm not really sure that they corresponds to
>> my problem of NAs.
>> I'll try to detail you a bit more.
>>
>> This problem concerns the second part of my program. In the first part, I've
>> already created a timeseries object with the library (timeseries). I had to
>> delete first all the wrong values in my data and replace it with NAs.
>> So my data contains already missing data (NAs), as I have cleaned it before.
>>
>> The thing is that sometimes I have small gaps of missing data (only 2 or 3
>> following) like in "example 1" below:
>>
>> example 1:
>>
>> 09/01/2008 12:00      1.93
>> 09/01/2008 12:15      3.93
>> 09/01/2008 12:30       NA            So here you have a small gap with only
>> 2 NAs
>> 09/01/2008 12:45       NA
>> 09/01/2008 13:00      4.93
>> 09/01/2008 13:15      5.93
>>
>> But sometimes, always in the same file, I have big gaps, such as 10 or more
>> NAs following each other like in "example 2" below:
>>
>> example 2:
>>
>> 09/01/2008 16:15                2.93
>> 09/01/2008 16:30                2.93
>> 09/01/2008 16:45                NA
>> 09/01/2008 17:00                NA
>> 09/01/2008 17:15                NA
>> 09/01/2008 17:30                NA
>> 09/01/2008 17:45                NA
>> 09/01/2008 18:00                NA          So here you have a big gap with more than 10
>> NAs following each other
>> 09/01/2008 18:15                NA
>> 09/01/2008 18:30                NA
>> 09/01/2008 18:45                NA
>> 09/01/2008 19:00                NA
>> 09/01/2008 19:15                NA
>> 09/01/2008 19:30                NA
>> 09/01/2008 19:45                NA
>> 09/01/2008 20:00                NA
>> 09/01/2008 20:15                7.93
>> 09/01/2008 20:30                7.93
>>
>> So in the whole same file, I can have sometimes big gaps (2 or 3 NAs),
>> sometimes big or very big gaps (10 or 100 NAs following).
>>
>> The aim of my problem is to apply the function: na.approx(x) of the library
>> (zoo) to fill NAs ONLY for small gaps.
>>
>> If I just do: apply(na.approx(x)), it will fill all the NAs of my data (big
>> gaps + small gaps). It's exactly what I DON'T WANT.
>>
>> My problem is to say to R: " you apply the function (na.approx) to fill NAs
>> ONLY if you see 4 NAs maximum following each other (small gaps) (like
>> example 1)". "If you see more than 4 NAs following each other (big gaps like
>> in example 2), you keep these NAs and you DON'T fill this big gap".
>>
>> My question is: how can I say this to R? I don't know how to do it.
>> Hope I've been understandable this time ^^
>> Thanks a lot again for all your answers!
>>
>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/filling-small-gaps-of-N-A-tp4528184p4528907.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list