[R] about interpolating data in r

lily li chocold12 at gmail.com
Fri Jul 22 17:29:14 CEST 2016


Thanks, Ismail.
For the gaps before 2009-01-05 and after 2009-11-20, I use the year 2010 to
fill in the missing values for column C. There is no relationship between
column A, B, and C.
For the missing values between 2009-01-05 and 2009-11-20, if there are any,
I found this approach is very helpful.
with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))



On Thu, Jul 21, 2016 at 5:14 PM, Ismail SEZEN <sezenismail at gmail.com> wrote:

>
> > On 22 Jul 2016, at 01:34, lily li <chocold12 at gmail.com> wrote:
> >
> > I have a question about interpolating missing values in a dataframe.
>
> First of all, filling missing values action must be taken into account
> very carefully. It must be known the nature of the data that wanted to be
> filled and most of the time, to let them be NA is the most appropriate
> action.
>
> > The
> > dataframe is in the following, Column C has no data before 2009-01-05 and
> > after 2009-12-31, how to interpolate data for the blanks?
>
> Why a dataframe? Is there any relationship between columns A,B and C? If
> there is, then you might want to consider filling missing values by a
> linear model approach instead of interpolation. You said that there is not
> data before 2009-01-05 and after 2009-12-31 but according to dataframe,
> there is not data after 2009-11-20?
>
> > That is to say,
> > interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.
>
> Also you metion interpolating blanks but you want interpolation between
> two gaps? Do you want to fill missing values before 2009-01-05 and after
> 2009-11-20 or do you want to find intermediate values between 2009-01-05
> and 2009-11-20? This is a bit unclear.
>
> >
> >
> > df
> > time                A      B     C
> > 2009-01-01    3      4.5
> > 2009-01-02    4      5
> > 2009-01-03    3.3   6
> > 2009-01-04    4.1   7
> > 2009-01-05    4.4   6.2   5.4
> > ...
> >
> > 2009-11-20    5.1   5.5   6.1
> > 2009-11-21    5.4   4
> > ...
> > 2009-12-31    4.5   6
>
>
> If you want to fill missing values at the end-points for column C (before
> 2009-01-05 and after 2009-11-20), and all data you have is between
> 2009-01-05 and 2009-11-20, this means that you want extrapolation (guessing
> unkonwn values that is out of known values). So, you can use only values at
> column C to guess missing end-point values. You can use splinefun (or
> spline) functions for this purpose. But let me note that this kind of
> approach might help you only for a few missing values close to end-points.
> Otherwise, you might find yourself in a huge mistake.
>
> As I mentioned in my first sentence, If you have a relationship between
> all columns or you have data for column C for other years (for instance,
> assume that you have data for column C for 2007, 2008, and 2010 but not
> 2009) you may want to try a statistical approach to fill the missing values.
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list