[R] merge a list of data frames

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Thu Sep 6 06:29:35 CEST 2012


I don't really know what you want, but if you have many columns with the same names I am wondering why this is so.  Do you really want to merge, which puts all of the non-key columns side-by-side in one data frame? If so, why don't you start by renaming the columns so they will make sense in the combined data frame?

If you really want the column names to stay the same, perhaps you want to stack the data frames "vertically" with rbind?
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Sam Steingold <sds at gnu.org> wrote:

>I have a list of data frames:
>
>> str(data)
>List of 4
> $ :'data.frame':	700773 obs. of  3 variables:
>..$ V1: chr [1:700773] "200130446465779" "200070050127778"
>"200030633708779" "200010587002779" ...
>  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>  ..$ V3: num [1:700773] 1 1 1 1 1 ...
> $ :'data.frame':	700773 obs. of  3 variables:
>..$ V1: chr [1:700773] "200130446465779" "200070050127778"
>"200030633708779" "200010587002779" ...
>  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>  ..$ V3: num [1:700773] 1 1 1 1 1 ...
> $ :'data.frame':	700773 obs. of  3 variables:
>..$ V1: chr [1:700773] "200130446465779" "200070050127778"
>"200030633708779" "200010587002779" ...
>  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>  ..$ V3: num [1:700773] 1 1 1 1 1 ...
> $ :'data.frame':	700773 obs. of  3 variables:
>..$ V1: chr [1:700773] "200160325893778" "200130647544079"
>"200130446465779" "200120186959078" ...
>  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>  ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...
>
>I want to merge them.
>I tried to follow
>http://rwiki.sciviews.org/doku.php?id=tips%3adata-frames%3amerge
>and did:
>
>> data.1 <- Reduce(function(f1,f2) merge(f1,f2,by=c("V1"),all=TRUE),
>data)
>Warning message:
>In merge.data.frame(f1, f2, by = c("V1"), all = TRUE) :
>column names 'V2.x', 'V3.x', 'V2.y', 'V3.y' are duplicated in the
>result
>> str(data.1)
>'data.frame':	700773 obs. of  9 variables:
>$ V1  : chr  "100010000099079" "100010000254078" "100010000499078"
>"100010000541779" ...
> $ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
> $ V3.x: num  0.476 0.748 0.442 0.483 0.577 ...
> $ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
> $ V3.y: num  0.476 0.748 0.442 0.483 0.577 ...
> $ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
> $ V3.x: num  0.476 0.752 0.443 0.485 0.578 ...
> $ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
> $ V3.y: num  0.47 0.733 0.57 0.416 0.616 ...
>
>I don't like the warning and I don't like that I now have to use [n] to
>access identically named columns, but, I guess, this is better than
>this:
>
>library('reshape')
>
>> data.1 <- merge_all(data,by="V1",all=TRUE)
>Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort =
>FALSE,  : 
>  formal argument "all" matched by multiple actual arguments
>> data.1 <- merge_all(data,by="V1",sort=TRUE,all=TRUE)
>Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort =
>FALSE,  : 
>  formal argument "all" matched by multiple actual arguments
>> data.1 <- merge_all(data,by="V1",sort=TRUE)
>Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort =
>FALSE,  : 
>  formal argument "sort" matched by multiple actual arguments
>> data.1 <- merge_all(data,by="V1")
>Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
>  undefined columns selected
>> data.1 <- merge_all(data,by=c("V1"))
>Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
>  undefined columns selected
>
>what does 'formal argument "sort" matched by multiple actual arguments'
>mean?
>
>thanks.




More information about the R-help mailing list