[R] Merge dataframes

David Winsemius dwinsemius at comcast.net
Fri Oct 7 22:05:57 CEST 2011


On Oct 7, 2011, at 9:34 AM, jdanielnd wrote:

> Hello,
>
> I am having some problems to use the 'merge' function. I'm not sure  
> if I got
> its working right.
>
> What I want to do is:
>
> 1) Suppose I have a dataframe like:
>
>        height           width
> 1        1.1                2.3
> 2        2.1                2.5
> 3        1.8                1.9
> 4        1.6                2.1
> 5        1.8                2.4
>
> 2) And I generate a second dataframe sampled from this one, like:
>
>        height           width
> 1        1.1                2.3
> 3        1.8                1.9
> 5        1.8                2.4
>
> 3) Next, I add a new variable from this dataframe:
>
>        height            width         color
> 1        1.1                2.3            red
> 3        1.8                1.9            red
> 5        1.8                2.4            blue
>
> 4) So, I want to merge those dataframes, so that the new variable,  
> color, is
> binded to the first dataframe. Of course some cases won't have value  
> for it,
> since I generated this variable in a smaller dataframe. In those  
> cases I
> want the value to be NA. The result dataframe should be:
>
>        height            width         color
> 1        1.1                2.3            red
> 2        2.1                2.5            NA
> 3        1.8                1.9            red
> 4        1.6                2.1            NA
> 5        1.8                2.4            blue
>
> I have written some codes, but they're not working properly. The new
> variable has its values mixed up, and they do not correspond to its
> row.names.
>
> # Generate the first dataframe
> data1 <- data.frame(height=rnorm(20,3,0.2),width=rnorm(20,2,0.5))
> # Sample a smaller dataframe from data1
> data2 <- data1[sample(1:20,15,replace=F),]
> # Generate the new variable
> color <- sample(c("red","blue"),15,replace=T)
> # Bind the new variable to data2
> data2 <- cbind(data2, color)
> # Merge the data1 and data2$color by row.names, and force it to has  
> the same
> values that data1. Next it generates a new dataframe where column 1  
> is the
> row.name, and then sort it by the row.name from data1.
> data.frame(merge(data1,data2$color, by=0,
> all.x=T),row.names=1)[row.names(data1),]
>
> I'm not sure what am I doing wrong.

I'm not sure what you want. You get the rownames with this:

 > str( merge(  data1, data2$color, by=0,  all.x=T) )
'data.frame':	20 obs. of  4 variables:
  $ Row.names:Class 'AsIs'  chr [1:20] "1" "10" "11" "12" ...
  $ height   : num  3.02 2.9 2.93 2.87 2.95 ...
  $ width    : num  1.7 1.85 1.51 2.14 2.22 ...
  $ y        : Factor w/ 2 levels "blue","red": 1 2 1 2 1 1 1 NA NA  
NA ...

If all you want is the original order then just resort:

newdat <- merge(  data1, data2$color, by=0,  all.x=T)
newdat[order(newdat$Row.names), ]

I checked to see if the Row.names were correct by also examining

  merge( cbind(rownames(data1), data1),
         data2$color,
         by=0,  all.x=T)

> Can anyone see where the mistake is?
>
> Thank you!
>
> Cheers,
>
> Joao D.
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Merge-dataframes-tp3882222p3882222.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list