[R] Problem with binding data-frames

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jun 19 12:19:23 CEST 2007


On Tue, 19 Jun 2007, Junnila, Jouni wrote:

> Hi,
>
> Yes, I'm aware that the problem is that I have differing number of
> columns in the different datasets. My question still remains. Is there
> some way I can allow column numbers to be different, or is there some
> other way combining these datasets?

Not with rbind.  With merge, if you have a suitable index column.
Using 2.5.1 beta:

> Day1 <- data.frame(x=1:2, id=1:2)
> Day2 <- data.frame(x=3:4, y=10:11, id=3:4)
> rbind(Day1, Day2)
Error in match.names(clabs, names(xi)) : names do not match previous names
> merge(Day1, Day2, all=TRUE)
   x id  y
1 1  1 NA
2 2  2 NA
3 3  3 10
4 4  4 11


>
> Thanks,
> -Jouni
>
> On Mon, 18 Jun 2007, Petr Klasterecky wrote:
>
>> Junnila, Jouni napsal(a):
>>> Hello,
>>>
>>> I'm having a problem concerning r-binding datasets.
>>>
>>> I have six datasets, from six different plates, and two different
> days.
>>> I want to combine these datasets together for analysis. Datasets from
>
>>> day 2, have all the same columns than datasets from day 1. However in
>
>>> addition, there are few columns more in day 2. Thus, using rbind for
>>> this, results a error, because the objects are not the same length.
>>>
>>> Error in paste(nmi[nii == 0L], collapse = ", ") :
>>>         object "nii" not found
>>> In addition: Warning message:
>>> longer object length
>>>         is not a multiple of shorter object length in: clabs == nmi
>>
>> Hi,
>>
>> 1. the error has nothing to do with differing lengths of your objects
>> - that's what the following warning is about. The error occured
>> because your indexing object 'nii' does not exist where R is looking
> for it.
>
> It's because the dataframes have differing number of columns, and that
> has not been allowed for in the error message in that version of R.
>
>> 2. using rbind on dataframes is a bad practice, since the input is
>> converted to marices if possible. Use merge() instead.
>
> Not so: rbind on data frames does no such conversion, and is not
> problematic provided they have the same column names (and hence the same
> number of columns).  You may have missed in ?rbind
>
>      The functions 'cbind' and 'rbind' are S3 generic, with methods for
>      data frames.  The data frame method will be used if at least one
>      argument is a data frame and the rest are vectors or matrices.
> ...
>
> and a later description of the data frame method for 'rbind'.
>
>>
>> Petr
>>
>>>
>>>
>>> What I need, is to combine all the six together, and give for example
>
>>> NA-value in day 1, for those columns which can only be found in day
> 2.
>>> Is this somehow possible?
>>>
>>> I have several of these six-datasets groups, and only few of them are
>
>>> having this problem described above, and I cannot know in advance
> which.
>>> With most of the groups writing
>>> rbind(data1,data2,data3,data4,data5,data6)
>>> works easily, but these few problematic groups need also to be
>>> combined...
>>> Any help greatly appreciated!
>>>
>>> -Jouni

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list