[R] Combine two dataframe with different row number and interpolation between values

Nilesh Dighe n||e@h@d|ghe @end|ng |rom b@yer@com
Fri Sep 2 13:15:24 CEST 2022


You can also use match function to combine the two dataframes as shown below:

wh<- match(interaction(df2[c("y", "d", "h")]), interaction(df1[c("y", "d", "h")])) matched <- !is.na(wh) df2[matched, "ws"] <- df1[wh[matched],"ws"]

head(df2)
# y d h               ws
# 1 2010 1 1               NA
# 2 2010 1 2               NA
# 3 2010 1 3 21.7863481620828
# 4 2010 1 4               NA
# 5 2010 1 5               NA
# 6 2010 1 6 21.9653408303519

Nilesh

-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of PIKAL Petr
Sent: Thursday, September 1, 2022 6:29 AM
To: Jim Lemon <drjimlemon using gmail.com>; javad bayat <j.bayat194 using gmail.com>; r-help mailing list <r-help using r-project.org>
Subject: Re: [R] Combine two dataframe with different row number and interpolation between values

Hi

You could also use approxfun to fill NA values

fff <- approxfun(1:dim(df3)[1], df3$ws.x)
df3$ws2 <- fff(1:dim(df3)[1])
head(df3)
     y d h     ws.x ws.y      ws2
1 2010 1 1       NA   NA       NA
2 2010 1 2       NA   NA       NA
3 2010 1 3 20.46314   NA 20.46314
4 2010 1 4       NA   NA 20.74001
5 2010 1 5       NA   NA 21.01689
6 2010 1 6 21.29376   NA 21.29376

And fill starting or trailing NA by na.locf function from zoo package or change starting or trailing NA to some value before calling approxfun.

Cheers
Petr

> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Jim Lemon
> Sent: Thursday, September 1, 2022 11:31 AM
> To: javad bayat <j.bayat194 using gmail.com>; r-help mailing list <r-help using r-
> project.org>
> Subject: Re: [R] Combine two dataframe with different row number and
> interpolation between values
>
> Hi Javad,
> You seem to have the data frame join worked out, so here is a function
> to interpolate over sequences of NAs. I have found it quite useful.
>
> # interpolate over sequences of NAs
> # NA sequences at the beginning of a file are replaced with the first
> nonNA value # NA sequences at the end are filled with the last nonNA
> value # sequences of all NAs are returned unaltered
> interpNA<-function(x) {
>  if(any(is.na(x))) {
>   xrle<-rle(is.na(x))
>   begin<-end<-1
>   # if the sequence begins with NA
>   # set the initial run of NAs to the 1st non-NA
>   if(is.na(x[1])) x[1:xrle$lengths[1]]<-x[xrle$lengths[1]+1]
>   for(i in 1:length(xrle$values)) {
>    # if this is a run of NA, fill it
>    if(xrle$values[i]) {
>     end<-begin+xrle$lengths[i]+(i>1)
>     if(is.na(x[end])) x[begin:(end-1)]<-x[begin]
>     else x[begin:end]<-seq(x[begin],x[end],length.out=1+end-begin)
>     begin<-end+xrle$lengths[i]-1
>   }
>    else {
>     # set begin to the end of the non-NA values
>     begin<-end+xrle$lengths[i]-1
>    }
>   }
>  }
>  return(x)
> }
>
> Jim
>
> On Thu, Sep 1, 2022 at 3:29 PM javad bayat <j.bayat194 using gmail.com> wrote:
> >
> >  Dear Tim;
> > The dplyr did not work for me. My data frames have exactly similar
columns
> > but different row numbers.
> > Mr Petr sent this code and the code worked but it copied the second
> > dataframe into the first one and did not replace the corresponding row.
> > > df3 = merge(df1, df2, all = TRUE)
> > Regarding filling the "NA" data, it does not matter for me to
interpolate
> > between numbers or put the mean of numbers.
> > Sincerely
> >
> >
> >
> >
> >
> >
> > On Wed, Aug 31, 2022 at 5:17 PM Ebert,Timothy Aaron <tebert using ufl.edu>
> wrote:
> >
> > > Can I interest you in the join functions in dplyr?
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > www.datasciencemadesimple.com%2Fjoin-in-r-merge-in-r%2F&data=0
> > > 5%7C01%7C%7C22d32a92ea5f4aa85d3208da8c04dc0f%7Cfcb2b37b5da0466b9b8
> > > 30014b67a7c78%7C0%7C0%7C637976250192968351%7CUnknown%7CTWFpbGZsb3d
> > > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> > > D%7C2000%7C%7C%7C&sdata=Vq0f9lMY7EzosUTCnsXEJXZZJYiEI5icCNql%2
> > > FBJNsaE%3D&reserved=0
> > >
> > > Filling in missing data is a useful practice when the fake
> > > (simulated) data is a small proportion of all data. When 2/3 of
> > > the data is fake
one
> > > must wonder if anything based on those numbers is real or an
> > > artifact
of
> > > assumptions made to generate the numbers.
> > >
> > > I deal with weather data and some weather stations are set up to
average
> > > measurements between recorded values and others take a single
> > > reading
at
> > > regular intervals. How you interpolate might depend on which
> > > option describes your data. The alternative is to use the method
> > > for
recording ws
> > > and apply it to the other data that will be merged with these
> > > data. I assume there is more data, otherwise I see little point in
> > > expanding
these
> > > values out.
> > >
> > > Tim
> > >
> > > -----Original Message-----
> > > From: R-help <r-help-bounces using r-project.org> On Behalf Of javad
> > > bayat
> > > Sent: Wednesday, August 31, 2022 2:09 AM
> > > To: r-help using r-project.org
> > > Subject: [R] Combine two dataframe with different row number and
> > > interpolation between values
> > >
> > > [External Email]
> > >
> > >  Dear all,
> > > I am trying to combine two large dataframe in order to make a
dataframe
> > > with exactly the dimension of the second dataframe.
> > > The first df is as follows:
> > >
> > > df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each =
> > > 2920), d
=
> > > rep(c(1:365,1:365,1:365,1:365,1:365),each=8),
> > >       h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by =
> > > 3),seq(3,24, by = 3),seq(3,24, by = 3)),365),
> > >       ws = rnorm(1:14600, mean=20))
> > > > head(df1)
> > >      y       d   h        ws
> > > 1  2010  1  3     20.71488
> > > 2  2010  1  6     19.70125
> > > 3  2010  1  9     21.00180
> > > 4  2010  1 12     20.29236
> > > 5  2010  1 15     20.12317
> > > 6  2010  1 18     19.47782
> > >
> > > The data in the "ws" column were measured with 3 hours frequency
> > > and I need data with one hour frequency. I have made a second df
> > > as follows
with
> > > one hour frequency for the "ws" column.
> > >
> > > df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each =
> > > 8760), d
=
> > > rep(c(1:365,1:365,1:365,1:365,1:365),each=24),
> > >       h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA")
> > > > head(df2)
> > >       y      d    h   ws
> > > 1  2010  1    1   NA
> > > 2  2010  1    2   NA
> > > 3  2010  1    3   NA
> > > 4  2010  1    4   NA
> > > 5  2010  1    5   NA
> > > 6  2010  1    6   NA
> > >
> > > What I am trying to do is combine these two dataframes so as to
> > > the
rows in
> > > df1 (based on the values of "y", "d", "h" columns) that have
> > > values exactly similar to df2's rows copied in its place in the new df (df3).
> > > For example, in the first dataframe the first row was measured at
> > > 3 o'clock on the first day of 2010 and this row must be placed on
> > > the
third
> > > row of the second dataframe which has a similar value (2010, 1, 3).
Like
> > > the below
> > > table:
> > >       y      d    h   ws
> > > 1  2010  1    1   NA
> > > 2  2010  1    2   NA
> > > 3  2010  1    3   20.71488
> > > 4  2010  1    4   NA
> > > 5  2010  1    5   NA
> > > 6  2010  1    6   19.70125
> > >
> > > But regarding the values of the "ws" column for df2 that do not
> > > have
value
> > > (at 4 and 5 o'clock), I need to interpolate between the before and
after
> > > values to fill in the missing data of the "ws".
> > > I have tried the following codes but they did not work correctly.
> > >
> > > > df3 = merge(df1, df2, by = "y")
> > > Error: cannot allocate vector of size 487.9 Mb or
> > > > library(dplyr)
> > > > df3<- df1%>% full_join(df2)
> > >
> > >
> > > Is there any way to do this?
> > > Sincerely
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Best Regards
> > > Javad Bayat
> > > M.Sc. Environment Engineering
> > > Alternative Mail: bayat194 using yahoo.com
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >
> > >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .et%2F&data=05%7C01%7C%7C22d32a92ea5f4aa85d3208da8c04dc0f%7Cfcb2b3
> 7b5da0466b9b830014b67a7c78%7C0%7C0%7C637976250192968351%7CUnknown%7CTW
> FpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6
> Mn0%3D%7C2000%7C%7C%7C&sdata=VR%2BQtSTNThYD2qRH%2FujMJvoAmSkj8ZWb6
> 9lQC4MNMGg%3D&reserved=0
> hz.ch%2Fmailman%2Flistinfo%2Fr-
> help&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc23d90
> 8da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C6379
> 75232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL
> CJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&a
> mp;sdata=WigJVAmdLn%2FK7ZtJq28%2Buv4aDmUjNXu6QPabdt5h2iQ%3D&a
> mp;reserved=0
> > > PLEASE do read the posting guide
> > >
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> -%2F&data=05%7C01%7C%7C22d32a92ea5f4aa85d3208da8c04dc0f%7Cfcb2b37b
> 5da0466b9b830014b67a7c78%7C0%7C0%7C637976250192968351%7CUnknown%7CTWFp
> bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn
> 0%3D%7C2000%7C%7C%7C&sdata=uAnoH%2BHi6aSg1mgtr7PBByegKlvksUnANfT2N
> BFpuSM%3D&reserved=0
> project.org%2Fposting-
> guide.html&data=05%7C01%7Ctebert%40ufl.edu%7C9e63e590cb834ddc
> 23d908da8b1802e2%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7
> C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%
> 7C&sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3
> D&reserved=0
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> > --
> > Best Regards
> > Javad Bayat
> > M.Sc. Environment Engineering
> > Alternative Mail: bayat194 using yahoo.com
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst
> > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7C%7C22d32
> > a92ea5f4aa85d3208da8c04dc0f%7Cfcb2b37b5da0466b9b830014b67a7c78%7C0%7
> > C0%7C637976250192968351%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&s
> > data=aOg9DevOR31NX4GBBOMN%2Bk6uOCUMDXzSv%2B968vc2QG8%3D&reserved
> > =0 PLEASE do read the posting guide
> > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > .r-project.org%2Fposting-&data=05%7C01%7C%7C22d32a92ea5f4aa85d32
> > 08da8c04dc0f%7Cfcb2b37b5da0466b9b830014b67a7c78%7C0%7C0%7C6379762501
> > 92968351%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> > iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=u1nQCr5eWi
> > 0zuXzen4C0CW4YSs8hXvmhepAVkD3We44%3D&reserved=0
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=05%7C01%7C%7C22d32a92e
> a5f4aa85d3208da8c04dc0f%7Cfcb2b37b5da0466b9b830014b67a7c78%7C0%7C0%7C6
> 37976250192968351%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> V2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=aOg9D
> evOR31NX4GBBOMN%2Bk6uOCUMDXzSv%2B968vc2QG8%3D&reserved=0
> PLEASE do read the posting guide
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.html&data=05%7C01%7C%7C22d32a92ea5f4aa85d3208da8c04dc0f%7Cfcb2b37b5da0466b9b830014b67a7c78%7C0%7C0%7C637976250192968351%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7C&sdata=zVgeOMKCYAFRaqqzUot0blLwXk%2FSgROm2qbmJu4hqFQ%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
________________________________

The information contained in this e-mail is for the excl...{{dropped:9}}



More information about the R-help mailing list