[R] Convert list of data frames to one data frame

Ira Sharenow |r@@h@renow100 @end|ng |rom y@hoo@com
Sat Jun 30 04:08:56 CEST 2018


Bert,

Thanks for your idea. However, the end results is not what I am looking 
for. Each initial data frame in the list will result in just one row in 
the final data frame. In your case

Row 1 of the initial structure will become 1 b 2 c3d NA NA NA NA in the 
end structure

Row 2 of the initial structure will become 5 k 6 l 7 m 8 n 9 o

Sarah’s code works

> dfbycol(zz)

first1 last1 first2 last2 first3 last3 first4 last4 first5 last5

one1b2c3d<NA><NA><NA><NA>

two5k6l7m8n9o


> 

dfbycol <- function(x) {

x <- lapply(x, function(y)as.vector(t(as.matrix(y))))

x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})

x <- do.call(rbind, x)

x <- data.frame(x, stringsAsFactors=FALSE)

colnames(x) <- paste0(c("first", "last"), rep(seq(1, ncol(x)/2), each=2))

x

}

Thanks.

By the way I am working with a colleague on this. Apparently the data 
came from reading in XML data.

Ira


On 6/29/2018 6:33 PM, Bert Gunter wrote:
> Well, I don't know your constraints, of course; but if I understand 
> correctly, in situations like this, it is usually worthwhile to 
> reconsider your data structure.
>
> This is a one-liner if you simply rbind all your data frames into one 
> with 2 columns. Here's an example to indicate how:
>
> ## list of two data frames with different column names and numbers of 
> rows:
> zz <-list(one = data.frame(f=1:3,g=letters[2:4]), two = data.frame(a = 
> 5:9,b = letters[11:15]))
>
> ## create common column names and bind them up:
> do.call(rbind,lapply(zz,function(x){   names(x) <- c("first","last"); x}))
>
> Note that the row names of the result tell you which original frame 
> the rows came from. This can also be obtained just from a count of 
> rows (?nrow) of the original list.
>
> Apologies if I misunderstand or your query or your constraints make 
> this simple approach impossible.
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along 
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Fri, Jun 29, 2018 at 5:29 PM, Ira Sharenow via R-help 
> <r-help using r-project.org <mailto:r-help using r-project.org>> wrote:
>
>
>     Sarah and David,
>
>     Thank you for your responses.I will try and be clearer.
>
>     Base R solution: Sarah’smethod worked perfectly
>
>     Is there a dplyrsolution?
>
>     START: list of dataframes
>
>     FINISH: one data frame
>
>     DETAILS: The initiallist of data frames might have hundreds or a
>     few thousand data frames. Everydata frame will have two columns.
>     The first column will represent first names.The second column will
>     represent last names. The column names are notconsistent. Data
>     frames will most likely have from one to five rows.
>
>     SUGGESTED STRATEGY:Convert the n by 2 data frames to 1 by 2n data
>     frames. Then somehow do an rbindeven though the number of columns
>     differ from data frame to data frame.
>
>     EXAMPLE: List with twodata frames
>
>     # DF1
>
>     First          Last
>
>     George Washington
>
>
>
>     # DF2
>
>     Start              End
>
>     John               Adams
>
>     Thomas        Jefferson
>
>
>
>     # End Result. One dataframe
>
>     First1      Second1        First2           Second2
>
>     George Washington       NA                    NA
>
>     John               Adams    Thomas        Jefferson
>
>
>
>     DISCUSSION: As mentionedI posted something on Stack Overflow.
>     Unfortunately, my example was not generalenough and so the
>     suggested solutions worked on the easy case which I provided
>     butnot when the names were different.
>
>     The suggested solution was:
>
>     library(dplyr)
>
>     bind_rows(lapply(employees4List,function(x)
>     rbind.data.frame(c(t(x)))))
>
>
>
>     On this site I pointedout that the inner function:
>     lapply(employees4List, function(x) rbind.data.frame(c(t(x))))
>
>     For each data frame correctlyspread the multiple rows into  1 by
>     2ndata frames. However, the column names were derived from the
>     values and were amess. This caused a problem with bind_rows.
>
>     I felt that if I knewhow to change all the names of all of the
>     data frames that were created afterlapply, then I could then use
>     bind_rows. So if someone knows how to change allof the names at
>     this intermediate stage, I hope that person will provide thesolution.
>
>     In  the end a 1 by 2 data frame would have namesFirst1 Second1. A
>     1 by 4 data framewould have names First1 Second1       
>     First2           Second2.
>
>     Ira
>
>
>         On Friday, June 29, 2018, 12:49:18 PM PDT, David Winsemius
>     <dwinsemius using comcast.net <mailto:dwinsemius using comcast.net>> wrote:
>
>
>     > On Jun 29, 2018, at 7:28 AM, Sarah Goslee
>     <sarah.goslee using gmail.com <mailto:sarah.goslee using gmail.com>> wrote:
>     >
>     > Hi,
>     >
>     > It isn't super clear to me what you're after.
>
>     Agree.
>
>     Had a different read of ht erequest. Thought the request was for a
>     first step that "harmonized" the names of the columns and then
>     used `dplyr::bind_rows`:
>
>     library(dplyr)
>      newList <- lapply( employees4List, 'names<-',
>     names(employees4List[[1]]) )
>      bind_rows(newList)
>
>     #---------
>
>       first1 second1
>     1      Al  Jones
>     2    Al2  Jones
>     3    Barb  Smith
>     4    Al3  Jones
>     5 Barbara  Smith
>     6  Carol  Adams
>     7      Al  Jones2
>
>     Might want to wrap suppressWarnings around the right side of that
>     assignment since there were many warnings regarding incongruent
>     factor levels.
>
>     -- 
>     David.
>     > Is this what you intend?
>     >
>     >> dfbycol(employees4BList)
>     >  first1 last1 first2 last2 first3 last3
>     > 1    Al Jones  <NA>  <NA>  <NA> <NA>
>     > 2    Al Jones  Barb Smith  <NA>  <NA>
>     > 3    Al Jones  Barb Smith  Carol Adams
>     > 4    Al Jones  <NA>  <NA>  <NA> <NA>
>     >>
>     >> dfbycol(employees4List)
>     >  first1  last1  first2 last2 first3 last3
>     > 1    Al  Jones    <NA>  <NA>  <NA> <NA>
>     > 2    Al2  Jones    Barb Smith  <NA>  <NA>
>     > 3    Al3  Jones Barbara Smith  Carol Adams
>     > 4    Al Jones2    <NA>  <NA>  <NA> <NA>
>     >
>     >
>     > If so:
>     >
>     > employees4BList = list(
>     > data.frame(first1 = "Al", second1 = "Jones"),
>     > data.frame(first1 = c("Al", "Barb"), second1 = c("Jones", "Smith")),
>     > data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>     > "Smith", "Adams")),
>     > data.frame(first1 = ("Al"), second1 = "Jones"))
>     >
>     > employees4List = list(
>     > data.frame(first1 = ("Al"), second1 = "Jones"),
>     > data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones",
>     "Smith")),
>     > data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 =
>     c("Jones",
>     > "Smith", "Adams")),
>     > data.frame(first4 = ("Al"), second4 = "Jones2"))
>     >
>     > ###
>     >
>     > dfbycol <- function(x) {
>     >  x <- lapply(x, function(y)as.vector(t(as.matrix(y))))
>     >  x <- lapply(x, function(y){length(y) <- max(sapply(x, length)); y})
>     >  x <- do.call(rbind, x)
>     >  x <- data.frame(x, stringsAsFactors=FALSE)
>     >  colnames(x) <- paste0(c("first", "last"), rep(seq(1,
>     ncol(x)/2), each=2))
>     >  x
>     > }
>     >
>     > ###
>     >
>     > dfbycol(employees4BList)
>     >
>     > dfbycol(employees4List)
>     >
>     > On Fri, Jun 29, 2018 at 2:36 AM, Ira Sharenow via R-help
>     > <r-help using r-project.org <mailto:r-help using r-project.org>> wrote:
>     >> I have a list of data frames which I would like to combine into
>     one data
>     >> frame doing something like rbind. I wish to combine in column
>     order and
>     >> not by names. However, there are issues.
>     >>
>     >> The number of columns is not the same for each data frame. This
>     is an
>     >> intermediate step to a problem and the number of columns could be
>     >> 2,4,6,8,or10. There might be a few thousand data frames.
>     Another problem
>     >> is that the names of the columns produced by the first step are
>     garbage.
>     >>
>     >> Below is a method that I obtained by asking a question on stack
>     >> overflow. Unfortunately, my example was not general enough. The
>     code
>     >> below works for the simple case where the names of the people are
>     >> consistent. It does not work when the names are realistically
>     not the same.
>     >>
>     >>
>     https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432
>     <https://stackoverflow.com/questions/50807970/converting-a-list-of-data-frames-not-a-simple-rbind-second-row-to-new-columns/50809432#50809432>
>     >>
>     >>
>     >> Please note that the lapply step sets things up except for the
>     column
>     >> name issue. If I could figure out a way to change the column
>     names, then
>     >> the bind_rows step will, I believe, work.
>     >>
>     >> So I really have two questions. How to change all column names
>     of all
>     >> the data frames and then how to solve the original problem.
>     >>
>     >> # The non general case works fine. It produces one data frame
>     and I can
>     >> then change the column names to
>     >>
>     >> # c("first1", "last1","first2", "last2","first3", "last3",)
>     >>
>     >> #Non general easy case
>     >>
>     >> employees4BList = list(data.frame(first1 = "Al", second1 =
>     "Jones"),
>     >>
>     >> data.frame(first1 = c("Al", "Barb"), second1 = c("Jones",
>     "Smith")),
>     >>
>     >> data.frame(first1 = c("Al", "Barb", "Carol"), second1 = c("Jones",
>     >> "Smith", "Adams")),
>     >>
>     >> data.frame(first1 = ("Al"), second1 = "Jones"))
>     >>
>     >> employees4BList
>     >>
>     >> bind_rows(lapply(employees4BList, function(x)
>     rbind.data.frame(c(t(x)))))
>     >>
>     >> # This produces a nice list of data frames, except for the names
>     >>
>     >> lapply(employees4BList, function(x) rbind.data.frame(c(t(x))))
>     >>
>     >> # This list is a disaster. I am looking for a solution that
>     works in
>     >> this case.
>     >>
>     >> employees4List = list(data.frame(first1 = ("Al"), second1 =
>     "Jones"),
>     >>
>     >> data.frame(first2 = c("Al2", "Barb"), second2 = c("Jones",
>     "Smith")),
>     >>
>     >> data.frame(first3 = c("Al3", "Barbara", "Carol"), second3 =
>     c("Jones",
>     >> "Smith", "Adams")),
>     >>
>     >> data.frame(first4 = ("Al"), second4 = "Jones2"))
>     >>
>     >>  bind_rows(lapply(employees4List, function(x)
>     rbind.data.frame(c(t(x)))))
>     >>
>     >> Thanks.
>     >>
>     >> Ira
>     >>
>     >
>     > --
>     > Sarah Goslee
>     > http://www.functionaldiversity.org
>     <http://www.functionaldiversity.org>
>     >
>     > ______________________________________________
>     > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>     > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.R-project.org/posting-guide.html>
>     > and provide commented, minimal, self-contained, reproducible code.
>
>     David Winsemius
>     Alameda, CA, USA
>
>     'Any technology distinguishable from magic is insufficiently
>     advanced.'  -Gehm's Corollary to Clarke's Third Law
>
>
>
>
>
>             [[alternative HTML version deleted]]
>
>     ______________________________________________
>     R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     <http://www.R-project.org/posting-guide.html>
>     and provide commented, minimal, self-contained, reproducible code.
>
>


	[[alternative HTML version deleted]]




More information about the R-help mailing list