[Rd] rbind on data.frame that contains a column that is also a data.frame

Michael Lachmann lachmann at eva.mpg.de
Fri Aug 6 00:21:47 CEST 2010


Hi,

The following was already a topic on r-help, but after understanding what is
going on, I think it fits better in r-devel.

The problem is this:
When a data.frame has another data.frame in it, rbind doesn't work well.
Here is an example:
--
> a=data.frame(x=1:10,y=1:10)
> b=data.frame(z=1:10)
> b$a=a
> b
    z a.x a.y
1   1   1   1
2   2   2   2
3   3   3   3
4   4   4   4
5   5   5   5
6   6   6   6
7   7   7   7
8   8   8   8
9   9   9   9
10 10  10  10
> rbind(b,b)
Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "2", "3", "4",  : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1’, ‘10’, ‘2’, ‘3’, ‘4’, ‘5’,
‘6’, ‘7’, ‘8’, ‘9’
--


Looking at the code of rbind.data.frame, the error comes from the   
lines: 
-- 
xij <- xi[[j]] 
if (has.dim[jj]) { 
  value[[jj]][ri, ] <- xij 
  rownames(value[[jj]])[ri] <- rownames(xij)   # <--  problem is here 
} 
-- 
if the rownames() line is dropped, all works well. What this line   
tries to do is to join the rownames of internal elements of the   
data.frames I try to rbind. So the result, in my case should have a   
column 'a', whose rownames are the rownames of the original column 'a'. It   
isn't totally clear to me why this is needed. When would a data.frame   
have different rownames on the inside vs. the outside? 

Notice also that rbind takes into account whether the rownames of the   
data.frames to be joined are simply 1:n, or they are something else.   
If they are 1:n, then the result will have rownames 1:(n+m). If not,   
then the rownames might be kept. 

I think, more consistent would be to replace the lines above with   
something like: 
             if (has.dim[jj]) { 
                 value[[jj]][ri, ] <- xij 
                 rnj = rownames(value[[jj]]) 
                 rnj[ri] = rownames(xij) 
                 rnj = make.unique(as.character(unlist(rnj)), sep = "") 
                 rownames(value[[jj]]) <- rnj 
             } 

In this case, the rownames of inside elements will also be joined, but   
in case they overlap, they will be made unique - just as they are for   
the overall result of rbind. A side effect here would be that the   
rownames of matrices will also be made unique, which till now didn't   
happen, and which also doesn't happen when one rbinds matrices that   
have rownames. So it would be better to test above if we are dealing   
with a matrix or a data.frame. 

But most people don't have different rownames inside and outside.   
Maybe it would be best to add a flag as to whether you care or don't   
care about the rownames of internal data.frames... 

But maybe data.frames aren't meant to contain other data.frames?

If instead I do
b=data.frame( z=1:10, a=a) 
then rbind(b,b) works well. In this case the data.frame was converted to its
columns. Maybe
b$a = a 
should do the same?

Michael 
-- 
View this message in context: http://r.789695.n4.nabble.com/rbind-on-data-frame-that-contains-a-column-that-is-also-a-data-frame-tp2315682p2315682.html
Sent from the R devel mailing list archive at Nabble.com.



More information about the R-devel mailing list