[R] Combine two dataframe with different row number and interpolation between values

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Thu Sep 1 11:33:42 CEST 2022


Hallo

This code do the merging without repeating the NA row.

df3 = merge(df1, df2, by=c("y", "d", "h"), all.y = TRUE)

Cheers
Petr

> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of javad bayat
> Sent: Thursday, September 1, 2022 7:24 AM
> To: Ebert,Timothy Aaron <tebert using ufl.edu>
> Cc: r-help using r-project.org
> Subject: Re: [R] Combine two dataframe with different row number and
> interpolation between values
> 
>  Dear Tim;
> The dplyr did not work for me. My data frames have exactly similar columns
but
> different row numbers.
> Mr Petr sent this code and the code worked but it copied the second
dataframe
> into the first one and did not replace the corresponding row.
> > df3 = merge(df1, df2, all = TRUE)
> Regarding filling the "NA" data, it does not matter for me to interpolate
between
> numbers or put the mean of numbers.
> Sincerely
> 
> 
> 
> 
> 
> 
> On Wed, Aug 31, 2022 at 5:17 PM Ebert,Timothy Aaron <tebert using ufl.edu>
> wrote:
> 
> > Can I interest you in the join functions in dplyr?
> > https://www.datasciencemadesimple.com/join-in-r-merge-in-r/
> >
> > Filling in missing data is a useful practice when the fake (simulated)
> > data is a small proportion of all data. When 2/3 of the data is fake
> > one must wonder if anything based on those numbers is real or an
> > artifact of assumptions made to generate the numbers.
> >
> > I deal with weather data and some weather stations are set up to
> > average measurements between recorded values and others take a single
> > reading at regular intervals. How you interpolate might depend on
> > which option describes your data. The alternative is to use the method
> > for recording ws and apply it to the other data that will be merged
> > with these data. I assume there is more data, otherwise I see little
> > point in expanding these values out.
> >
> > Tim
> >
> > -----Original Message-----
> > From: R-help <r-help-bounces using r-project.org> On Behalf Of javad bayat
> > Sent: Wednesday, August 31, 2022 2:09 AM
> > To: r-help using r-project.org
> > Subject: [R] Combine two dataframe with different row number and
> > interpolation between values
> >
> > [External Email]
> >
> >  Dear all,
> > I am trying to combine two large dataframe in order to make a
> > dataframe with exactly the dimension of the second dataframe.
> > The first df is as follows:
> >
> > df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d
> > = rep(c(1:365,1:365,1:365,1:365,1:365),each=8),
> >       h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by =
> > 3),seq(3,24, by = 3),seq(3,24, by = 3)),365),
> >       ws = rnorm(1:14600, mean=20))
> > > head(df1)
> >      y       d   h        ws
> > 1  2010  1  3     20.71488
> > 2  2010  1  6     19.70125
> > 3  2010  1  9     21.00180
> > 4  2010  1 12     20.29236
> > 5  2010  1 15     20.12317
> > 6  2010  1 18     19.47782
> >
> > The data in the "ws" column were measured with 3 hours frequency and I
> > need data with one hour frequency. I have made a second df as follows
> > with one hour frequency for the "ws" column.
> >
> > df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d
> > = rep(c(1:365,1:365,1:365,1:365,1:365),each=24),
> >       h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA")
> > > head(df2)
> >       y      d    h   ws
> > 1  2010  1    1   NA
> > 2  2010  1    2   NA
> > 3  2010  1    3   NA
> > 4  2010  1    4   NA
> > 5  2010  1    5   NA
> > 6  2010  1    6   NA
> >
> > What I am trying to do is combine these two dataframes so as to the
> > rows in
> > df1 (based on the values of "y", "d", "h" columns) that have values
> > exactly similar to df2's rows copied in its place in the new df (df3).
> > For example, in the first dataframe the first row was measured at 3
> > o'clock on the first day of 2010 and this row must be placed on the
> > third row of the second dataframe which has a similar value (2010, 1,
> > 3). Like the below
> > table:
> >       y      d    h   ws
> > 1  2010  1    1   NA
> > 2  2010  1    2   NA
> > 3  2010  1    3   20.71488
> > 4  2010  1    4   NA
> > 5  2010  1    5   NA
> > 6  2010  1    6   19.70125
> >
> > But regarding the values of the "ws" column for df2 that do not have
> > value (at 4 and 5 o'clock), I need to interpolate between the before
> > and after values to fill in the missing data of the "ws".
> > I have tried the following codes but they did not work correctly.
> >
> > > df3 = merge(df1, df2, by = "y")
> > Error: cannot allocate vector of size 487.9 Mb or
> > > library(dplyr)
> > > df3<- df1%>% full_join(df2)
> >
> >
> > Is there any way to do this?
> > Sincerely
> >
> >
> >
> >
> >
> > --
> > Best Regards
> > Javad Bayat
> > M.Sc. Environment Engineering
> > Alternative Mail: bayat194 using yahoo.com
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> > .ethz.ch%2Fmailman%2Flistinfo%2Fr-
> help&data=05%7C01%7Ctebert%40ufl
> >
> .edu%7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace6
> 0a62331e
> >
> 1b84%7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8
> eyJWIjoiMC4w
> >
> LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C
> %7C%7C
> >
> &sdata=WigJVAmdLn%2FK7ZtJq28%2Buv4aDmUjNXu6QPabdt5h2iQ%3D
> &rese
> > rved=0
> > PLEASE do read the posting guide
> >
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> > -project.org%2Fposting-
> guide.html&data=05%7C01%7Ctebert%40ufl.edu%
> >
> 7C9e63e590cb834ddc23d908da8b1802e2%7C0d4da0f84a314d76ace60a623
> 31e1b84%
> >
> 7C0%7C0%7C637975232519465332%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwM
> >
> DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7
> C&
> >
> sdata=m1VCPBpnYy%2FwqlOuf5froUVEMBDJKwDuAWS4cFNx1wI%3D&r
> eserved=0
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> --
> Best Regards
> Javad Bayat
> M.Sc. Environment Engineering
> Alternative Mail: bayat194 using yahoo.com
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list