[R] rbind of multiple data frames by column name, when each data frames can contain different columns

d@vidwk@tz m@iii@g oii gm@ii@com d@vidwk@tz m@iii@g oii gm@ii@com
Fri Jun 3 20:12:31 CEST 2022


library(data.table)
?rbindlist

On Fri, Jun 3, 2022 at 11:06 AM CALUM POLWART <polc1410 using gmail.com> wrote:

> Bert! It sounds like you are warming to the the tidyverse! ;-)
>
> I completed agree with your analysis this data would be best served long as
> you have shown.
>
> If the OP was going to use tidy to manipulate it, they can do the same with
> tidyr::pivot_ functions (pivot_longer and pivot_wider). The result will be
> the same - but understanding how it happened may be easier!
>
> On Thu, 2 Jun 2022, 21:04 Bert Gunter, <bgunter.4567 using gmail.com> wrote:
>
> > Well, it seems better to me to put all the data frames in long format and
> > then rbind them instead of the other way round, which results in the
> piles
> > of NA's you see. I note also, FWIW, that this accords with the so-called
> > "tidy" format that many advocate these days. You can always subset (rows)
> > and choose by station, date, etc. as needed, of course from the long
> > format.
> >
> > Because of the regularity of your data frames, it is easy to do this.
> Here
> > is a little base R function that "reforms" each data frame (I suspect Rui
> > may well provide a more elegant version, though):
> >
> >
> > reform<- function(dat){
> >    nm <- names(dat)
> >    stanums <- unique(gsub("[^[:digit:]]","", nm[-1])) ## station numbers
> > present
> >     z <- do.call(rbind, lapply(stanums,
> >       \(i)
> >       structure(dat[,grep(i, nm, fixed = TRUE)],
> >                 names = c("Hs", "Hn", "flag"))))
> >    data.frame(POSIX =rep(dat[,1], length(stanums)),
> >               Station = rep(stanums, e = nrow(dat)),
> >               z)
> > }
> >
> > e.g.
> > > reform(df2)
> >         POSIX Station Hs Hn flag
> > 1  2001-12-01       1 50 20    0
> > 2  2001-12-02       1 60 20    0
> > 3  2001-12-03       1 70 20    0
> > 4  2001-12-04       1 NA NA   NA
> > 5  2001-12-05       1 NA NA   NA
> > 6  2001-12-01       3 20  0    0
> > 7  2001-12-02       3 20  0    0
> > 8  2001-12-03       3 30 10    0
> > 9  2001-12-04       3 30  0    1
> > 10 2001-12-05       3  0  5    0
> >
> > A call to rbind() of the following form then gives you all your data in
> > long form(you may wish to use some shortcuts to form the list of frames):
> >
> > > do.call(rbind, lapply(list(df1, df2, df3), reform))
> >         POSIX Station Hs Hn flag
> > 1  2000-12-01       1 30 10    0
> > 2  2000-12-02       1 40 20    0
> > 3  2000-12-03       1 50 10    0
> > 4  2000-12-04       1 NA NA   NA
> > 5  2000-12-05       1 55  5    0
> > 6  2000-12-01       2 20  0    0
> > 7  2000-12-02       2 20  0    0
> > 8  2000-12-03       2 30 10    0
> > 9  2000-12-04       2 30  0    1
> > 10 2000-12-05       2  0  5    0
> > 11 2001-12-01       1 50 20    0
> > 12 2001-12-02       1 60 20    0
> > 13 2001-12-03       1 70 20    0
> > 14 2001-12-04       1 NA NA   NA
> > 15 2001-12-05       1 NA NA   NA
> > 16 2001-12-01       3 20  0    0
> > 17 2001-12-02       3 20  0    0
> > 18 2001-12-03       3 30 10    0
> > 19 2001-12-04       3 30  0    1
> > 20 2001-12-05       3  0  5    0
> > 21 2002-12-01       2 50 20    0
> > 22 2002-12-02       2 60 20    0
> > 23 2002-12-03       2 70 20    0
> > 24 2002-12-04       2 NA NA   NA
> > 25 2002-12-05       2 NA NA   NA
> > 26 2002-12-01       3 20  0    0
> > 27 2002-12-02       3 20  0    0
> > 28 2002-12-03       3 30 10    0
> > 29 2002-12-04       3 30  0    1
> > 30 2002-12-05       3  0  5    0
> >
> >
> > Cheers,
> > Bert Gunter
> >
> >
> >
> >
> > On Wed, Jun 1, 2022 at 11:13 PM Stefano Sofia <
> > stefano.sofia using regione.marche.it> wrote:
> >
> > > Dear R-list users,
> > >
> > > for each winter season from 2000 to 2022 I have a data frame collecting
> > > for different weather stations snowpack height (Hs), snowfall in the
> last
> > > 24h (Hn) and a validation flag.
> > >
> > > Suppose I have these three following data frames
> > >
> > >
> > > df1 <- data.frame(data_POSIX=seq(as.POSIXct("2000-12-01",
> > > format="%Y-%m-%d", tz="Etc/GMT-1"), as.POSIXct("2000-12-05",
> > > format="%Y-%m-%d", tz="Etc/GMT-1"), by="1 days"), Station1_Hs = c(30,
> 40,
> > > 50, NA, 55), Station1_Hn = c(10, 20, 10, NA, 5), Station1_flag = c(0,
> 0,
> > 0,
> > > NA, 0), Station2_Hs = c(20, 20, 30, 30, 0), Station2_Hn = c(0, 0, 10,
> 0,
> > > 5), Station2_flag = c(0, 0, 0, 1, 0))
> > >
> > >
> > > df2 <- data.frame(data_POSIX=seq(as.POSIXct("2001-12-01",
> > > format="%Y-%m-%d", tz="Etc/GMT-1"), as.POSIXct("2001-12-05",
> > > format="%Y-%m-%d", tz="Etc/GMT-1"), by="1 days"), Station1_Hs = c(50,
> 60,
> > > 70, NA, NA), Station1_Hn = c(20, 20, 20, NA, NA), Station1_flag = c(0,
> 0,
> > > 0, NA, NA), Station3_Hs = c(20, 20, 30, 30, 0), Station3_Hn = c(0, 0,
> 10,
> > > 0, 5), Station3_flag = c(0, 0, 0, 1, 0))
> > >
> > >
> > > df3 <- data.frame(data_POSIX=seq(as.POSIXct("2002-12-01",
> > > format="%Y-%m-%d", tz="Etc/GMT-1"), as.POSIXct("2002-12-05",
> > > format="%Y-%m-%d", tz="Etc/GMT-1"), by="1 days"), Station2_Hs = c(50,
> 60,
> > > 70, NA, NA), Station2_Hn = c(20, 20, 20, NA, NA), Station2_flag = c(0,
> 0,
> > > 0, NA, NA), Station3_Hs = c(20, 20, 30, 30, 0), Station3_Hn = c(0, 0,
> 10,
> > > 0, 5), Station3_flag = c(0, 0, 0, 1, 0))
> > >
> > >
> > > As you can see, each data frame can have different stations loaded.
> > >
> > > I would need to call rbind matching data frames by column name (i.e. by
> > > station name), keeping in mind that the number of stations loaded in
> each
> > > data frame may differ. The result should be
> > >
> > > data_POSIX Station1_Hs Station1_Hn Station1_flag Station2_Hs
> Station2_Hn
> > > Station2_flag Station3_Hs Station3_Hn Station3_flag
> > > 2000-12-01 30 10 0 20 0 0 NA NA NA
> > > 2000-12-02 40 20 0 20 0 0 NA NA NA
> > > 2000-12-03 50 10 0 30 10 0 NA NA NA
> > > 2000-12-04 NA NA NA 30 0 0 NA NA NA
> > > 2000-12-05 55 5 0 0 5 0 NA NA NA
> > > 2001-12-01 50 20 0 NA NA NA 20 0 0
> > > 2001-12-02 60 20 0 NA NA NA 20 0 0
> > > 2001-12-03 70 20 0 NA NA NA 30 10 0
> > > 2001-12-04 NA NA NA NA NA NA 30 0 1
> > > 2001-12-05 NA NA NA NA NA NA 0 5 0
> > > 2002-12-01 NA NA NA 50 20 0 20 0 0
> > > 2002-12-02 NA NA NA 60 20 0 20 0 0
> > > 2002-12-03 NA NA NA 70 20 0 30 10 0
> > > 2002-12-04 NA NA NA NA NA NA 30 0 1
> > > 2002-12-05 NA NA NA NA NA NA 0 5 0
> > >
> > > I tried this code
> > >
> > > df_list <- list(df1, df2, df3)
> > > allNms <- unique(unlist(lapply(df_list, names)))
> > > do.call(rbind, c(lapply(df_list, function(x) data.frame(c(x,
> > > sapply(setdiff(allNms, names(x)), function(y) NA)))),
> > make.row.names=FALSE))
> > >
> > > but I get this error:
> > > Error in (function (..., row.names = NULL, check.rows = FALSE,
> > check.names
> > > = TRUE,  :
> > >   arguments imply differing number of rows
> > >
> > > Could someone please help me?
> > >
> > >
> > > Thank you for your attention
> > >
> > > Stefano
> > >
> > >
> > >          (oo)
> > > --oOO--( )--OOo--------------------------------------
> > > Stefano Sofia PhD
> > > Civil Protection - Marche Region - Italy
> > > Meteo Section
> > > Snow Section
> > > Via del Colle Ameno 5
> > > 60126 Torrette di Ancona, Ancona (AN)
> > > Uff: +39 071 806 7743
> > > E-mail: stefano.sofia using regione.marche.it
> > > ---Oo---------oO----------------------------------------
> > >
> > > ________________________________
> > >
> > > AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere
> > > informazioni confidenziali, pertanto è destinato solo a persone
> > autorizzate
> > > alla ricezione. I messaggi di posta elettronica per i client di Regione
> > > Marche possono contenere informazioni confidenziali e con privilegi
> > legali.
> > > Se non si è il destinatario specificato, non leggere, copiare,
> inoltrare
> > o
> > > archiviare questo messaggio. Se si è ricevuto questo messaggio per
> > errore,
> > > inoltrarlo al mittente ed eliminarlo completamente dal sistema del
> > proprio
> > > computer. Ai sensi dell'art. 6 della DGR n. 1394/2008 si segnala che,
> in
> > > caso di necessità ed urgenza, la risposta al presente messaggio di
> posta
> > > elettronica può essere visionata da persone estranee al destinatario.
> > > IMPORTANT NOTICE: This e-mail message is intended to be received only
> by
> > > persons entitled to receive the confidential information it may
> contain.
> > > E-mail messages to clients of Regione Marche may contain information
> that
> > > is confidential and legally privileged. Please do not read, copy,
> > forward,
> > > or store this message unless you are an intended recipient of it. If
> you
> > > have received this message in error, please forward it to the sender
> and
> > > delete it completely from your computer system.
> > >
> > > --
> > > Questo messaggio  stato analizzato da Libraesva ESG ed  risultato non
> > > infetto.
> > > This message was scanned by Libraesva ESG and is believed to be clean.
> > >
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list