[R] Combine two dataframe with different row number and interpolation between values

Jim Lemon drj|m|emon @end|ng |rom gm@||@com
Thu Sep 1 11:31:15 CEST 2022


Hi Javad,
You seem to have the data frame join worked out, so here is a function
to interpolate over sequences of NAs. I have found it quite useful.

# interpolate over sequences of NAs
# NA sequences at the beginning of a file are replaced with the first
nonNA value
# NA sequences at the end are filled with the last nonNA value
# sequences of all NAs are returned unaltered
interpNA<-function(x) {
 if(any(is.na(x))) {
  xrle<-rle(is.na(x))
  begin<-end<-1
  # if the sequence begins with NA
  # set the initial run of NAs to the 1st non-NA
  if(is.na(x[1])) x[1:xrle$lengths[1]]<-x[xrle$lengths[1]+1]
  for(i in 1:length(xrle$values)) {
   # if this is a run of NA, fill it
   if(xrle$values[i]) {
    end<-begin+xrle$lengths[i]+(i>1)
    if(is.na(x[end])) x[begin:(end-1)]<-x[begin]
    else x[begin:end]<-seq(x[begin],x[end],length.out=1+end-begin)
    begin<-end+xrle$lengths[i]-1
  }
   else {
    # set begin to the end of the non-NA values
    begin<-end+xrle$lengths[i]-1
   }
  }
 }
 return(x)
}

Jim

On Thu, Sep 1, 2022 at 3:29 PM javad bayat <j.bayat194 using gmail.com> wrote:
>
>  Dear Tim;
> The dplyr did not work for me. My data frames have exactly similar columns
> but different row numbers.
> Mr Petr sent this code and the code worked but it copied the second
> dataframe into the first one and did not replace the corresponding row.
> > df3 = merge(df1, df2, all = TRUE)
> Regarding filling the "NA" data, it does not matter for me to interpolate
> between numbers or put the mean of numbers.
> Sincerely
>
>
>
>
>
>
> On Wed, Aug 31, 2022 at 5:17 PM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>
> > Can I interest you in the join functions in dplyr?
> > https://www.datasciencemadesimple.com/join-in-r-merge-in-r/
> >
> > Filling in missing data is a useful practice when the fake (simulated)
> > data is a small proportion of all data. When 2/3 of the data is fake one
> > must wonder if anything based on those numbers is real or an artifact of
> > assumptions made to generate the numbers.
> >
> > I deal with weather data and some weather stations are set up to average
> > measurements between recorded values and others take a single reading at
> > regular intervals. How you interpolate might depend on which option
> > describes your data. The alternative is to use the method for recording ws
> > and apply it to the other data that will be merged with these data. I
> > assume there is more data, otherwise I see little point in expanding these
> > values out.
> >
> > Tim
> >
> > -----Original Message-----
> > From: R-help <r-help-bounces using r-project.org> On Behalf Of javad bayat
> > Sent: Wednesday, August 31, 2022 2:09 AM
> > To: r-help using r-project.org
> > Subject: [R] Combine two dataframe with different row number and
> > interpolation between values
> >
> > [External Email]
> >
> >  Dear all,
> > I am trying to combine two large dataframe in order to make a dataframe
> > with exactly the dimension of the second dataframe.
> > The first df is as follows:
> >
> > df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d =
> > rep(c(1:365,1:365,1:365,1:365,1:365),each=8),
> >       h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by =
> > 3),seq(3,24, by = 3),seq(3,24, by = 3)),365),
> >       ws = rnorm(1:14600, mean=20))
> > > head(df1)
> >      y       d   h        ws
> > 1  2010  1  3     20.71488
> > 2  2010  1  6     19.70125
> > 3  2010  1  9     21.00180
> > 4  2010  1 12     20.29236
> > 5  2010  1 15     20.12317
> > 6  2010  1 18     19.47782
> >
> > The data in the "ws" column were measured with 3 hours frequency and I
> > need data with one hour frequency. I have made a second df as follows with
> > one hour frequency for the "ws" column.
> >
> > df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d =
> > rep(c(1:365,1:365,1:365,1:365,1:365),each=24),
> >       h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA")
> > > head(df2)
> >       y      d    h   ws
> > 1  2010  1    1   NA
> > 2  2010  1    2   NA
> > 3  2010  1    3   NA
> > 4  2010  1    4   NA
> > 5  2010  1    5   NA
> > 6  2010  1    6   NA
> >
> > What I am trying to do is combine these two dataframes so as to the rows in
> > df1 (based on the values of "y", "d", "h" columns) that have values
> > exactly similar to df2's rows copied in its place in the new df (df3).
> > For example, in the first dataframe the first row was measured at 3
> > o'clock on the first day of 2010 and this row must be placed on the third
> > row of the second dataframe which has a similar value (2010, 1, 3). Like
> > the below
> > table:
> >       y      d    h   ws
> > 1  2010  1    1   NA
> > 2  2010  1    2   NA
> > 3  2010  1    3   20.71488
> > 4  2010  1    4   NA
> > 5  2010  1    5   NA
> > 6  2010  1    6   19.70125
> >
> > But regarding the values of the "ws" column for df2 that do not have value
> > (at 4 and 5 o'clock), I need to interpolate between the before and after
> > values to fill in the missing data of the "ws".
> > I have tried the following codes but they did not work correctly.
> >
> > > df3 = merge(df1, df2, by = "y")
> > Error: cannot allocate vector of size 487.9 Mb or
> > > library(dplyr)
> > > df3<- df1%>% full_join(df2)
> >
> >
> > Is there any way to do this?
> > Sincerely
> >
> >
> >
> >
> >
> > --
> > Best Regards
> > Javad Bayat
> > M.Sc. Environment Engineering
> > Alternative Mail: bayat194 using yahoo.com
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=WigJVAmdLn%2FK7ZtJq28%2Buv4aDmUjNXu6QPabdt5h2iQ%3D&reserved=0
> > PLEASE do read the posting guide
> > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3D&reserved=0
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Best Regards
> Javad Bayat
> M.Sc. Environment Engineering
> Alternative Mail: bayat194 using yahoo.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list