[R] Problems with dataframe

Thu Sep 1 18:25:21 CEST 2022

Hello,

The following code does not merge the two data sets, it creates a third one.

1. split the data by year and day
2. for each sub-df, assign the values in df1$ws to the corresponding df2$ws
3. loop through df3 interpolating the values in ws and replacing the NA 
by the new ones. This is linear interpolation, not to substitute the 
mean for NA's.
4. bind the sub-df's together and remove the temp sp1, sp2.
5. if you want to, you can write df2 wherever df3 occurs. In fact, there 
is no need for the new df3, I have coded it like this to make the 
several steps more clear but you can simply use df2 throughoutthe code 
below.

sp1 <- split(df1, list(df1$y, df1$d))
sp2 <- split(df2, list(df2$y, df2$d))

df3 <- Map(\(X, Y) {
   i <- match(X$h, Y$h)
   Y$ws[i] <- X$ws
   Y
}, sp1, sp2)

df3 <- lapply(df3, \(X) {
   X$ws <- approx(X$h, X$ws, xout = X$h)$y
   X
})

df3 <- do.call(rbind, df3)
str(df3)
rm(sp1, sp2)

Hope this helps,

Rui Barradas

Às 10:03 de 01/09/2022, javad bayat escreveu:
>   Dear Rui,
> I have sent this to the R mailing list but I did not receive a right
> solution. I apologize for sending this to you. Can you help me to fix this
> problem?
> 
> I am trying to combine two large dataframe in order to make a dataframe
> with exactly the dimension of the second dataframe.
> The first df is as follows:
> 
> df1 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 2920), d =
> rep(c(1:365,1:365,1:365,1:365,1:365),each=8),
>        h = rep(c(seq(3,24, by = 3),seq(3,24, by = 3),seq(3,24, by =
> 3),seq(3,24, by = 3),seq(3,24, by = 3)),365),
>        ws = rnorm(1:14600, mean=20))
>> head(df1)
>       y       d   h        ws
> 1  2010  1  3     20.71488
> 2  2010  1  6     19.70125
> 3  2010  1  9     21.00180
> 4  2010  1 12     20.29236
> 5  2010  1 15     20.12317
> 6  2010  1 18     19.47782
> 
> The data in the "ws" column were measured with 3 hours frequency and I need
> data with one hour frequency. I have made a second df as follows with one
> hour frequency for the "ws" column.
> 
> df2 = data.frame(y = rep(c(2010,2011,2012,2013,2014), each = 8760), d =
> rep(c(1:365,1:365,1:365,1:365,1:365),each=24),
>        h = rep(c(1:24,1:24,1:24,1:24,1:24),365), ws = "NA")
>> head(df2)
>        y      d    h   ws
> 1  2010  1    1   NA
> 2  2010  1    2   NA
> 3  2010  1    3   NA
> 4  2010  1    4   NA
> 5  2010  1    5   NA
> 6  2010  1    6   NA
> 
> What I am trying to do is combine these two dataframes so as to the rows in
> df1 (based on the values of "y", "d", "h" columns) that have values exactly
> similar to df2's rows copied in its place in the new df (df3).
> For example, in the first dataframe the first row was measured at 3 o'clock
> on the first day of 2010 and this row must be placed on the third row of
> the second dataframe which has a similar value (2010, 1, 3). Like the below
> table:
>        y      d    h   ws
> 1  2010  1    1   NA
> 2  2010  1    2   NA
> 3  2010  1    3   20.71488
> 4  2010  1    4   NA
> 5  2010  1    5   NA
> 6  2010  1    6   19.70125
> 
> But regarding the values of the "ws" column for df2 that do not have value
> (at 4 and 5 o'clock), I need to interpolate between the before and after
> values to fill in the missing data of the "ws".
> I have tried the following codes but they did not work correctly.
> 
>> df3 = merge(df1, df2, by = "y")
> Error: cannot allocate vector of size 487.9 Mb
> or
>> library(dplyr)
>> df3<- df1%>% full_join(df2)
> 
> Pikal Petr has sent a code and the code worked but it copied the second
> dataframe into the first one and did not replace the corresponding row in
> the first one.
> 
>> df3 = merge(df1, df2, all = TRUE)
> 
> Regarding filling the "NA" data, it does not matter for me to interpolate
> between numbers or put the mean of numbers.
> The dplyr did not work for me. My data frames have exactly similar columns
> but different row numbers.
> 
> Is there any way to do this?
> Sincerely
> 
>