[R] cbind, data.frame | numeric to string?

David Winsemius dwinsemius at comcast.net
Tue Apr 10 18:19:25 CEST 2012


On Apr 10, 2012, at 11:58 AM, Rainer Schuermann wrote:

> cbind() works as well, but only if c is attached to the existing  
> test variable:
>
>> tst <- cbind( test, c )
>> tst
>   a    b   c
> 1  1  0.3  y1
> 2  2  0.4  y2
> 3  3  0.5  y3
> 4  4  0.6  y4
> 5  5  0.7  y5
>> str( tst )
> 'data.frame':   5 obs. of  3 variables:
> $ a: num  1 2 3 4 5
> $ b: num  0.3 0.4 0.5 0.6 0.7
> $ c: Factor w/ 5 levels "y1","y2","y3",..: 1 2 3 4 5
>
> Not saying it is a good idea, though...

To be somewhat more expansive ... 'cbind' is not just one function,  
but rather a set of functions, since it is "generic". The one that is  
chosen by the interpreter will depend on whether the first argument  
has a class. If it does have a class as in the example above having a  
class of "data.frame", then the cbind.data.frame function will be  
dispatched to process the list of arguments. If the first argument  
doesn't have a class as in the OP's second example below, then the  
internal cbind function will be used and returns a matrics which  
strips off all but a few attributes and forces a lowest common  
denominator mode. If only one of the arguments were logical, then  
cbind would return a a matrix of all TRUEs and FALSEs.

(This all assumes that the typos in the OP's original example that  
created 'c' as an incomplete expression and a and b with unequal  
lengths were fixed.)

 > a <- c(1,2,3,4,5);
 > b <- c(0.3,0.4,0.5,0,6,0.7);
 > test <- data.frame(cbind(a,b))
Warning message:
In cbind(a, b) :
   number of rows of result is not a multiple of vector length (arg 1)
 > c <- c("y1","y2","y3","y4","y5")
 > cbind(c, test)
Error in data.frame(..., check.names = FALSE) :
   arguments imply differing number of rows: 5, 6
-- 
David.


>
> Rainer
>
>
> On Tuesday 10 April 2012 11:38:51 R. Michael Weylandt wrote:
>> Don't use cbind() -- it forces everything into a single type, here
>> string, which in turn becomes factor.
>>
>> Simply,
>>
>> data.frame(a, b, c)
>>
>> Like David mentioned a few days ago, I have no idea who is promoting
>> this data.frame(cbind(...)) idiom, but it's a terrible idea (albeit
>> one that seems to be very frequent over the last few weeks)
>>
>> Michael
>>
>> On Tue, Apr 10, 2012 at 10:33 AM, Anser Chen <anser.chen at gmail.com>  
>> wrote:
>>> Complete newbie to R -- struggling with something which should be  
>>> pretty
>>> basic. Trying to create a simple data set (which I gather R refers  
>>> to as a
>>> data.frame). So
>>>
>>>> a <- c(1,2,3,4,5);
>>>> b <- c(0.3,0.4,0.5,0,6,0.7);
>>>
>>> Stick the two together into a data frame (call test) using cbind
>>>
>>>> test <- data.frame(cbind(a,b))
>>>
>>> Seems to do the trick:
>>>
>>>> test
>>> a   b
>>> 1 1 0.3
>>> 2 2 0.4
>>> 3 3 0.5
>>> 4 4 0.6
>>> 5 5 0.7
>>>>
>>>
>>> Confirm that each variable is numeric:
>>>
>>>> is.numeric(test$a)
>>> [1] TRUE
>>>> is.numeric(test$b)
>>> [1] TRUE
>>>
>>>
>>> OK, so far so good. But, now I want to merge in a vector of  
>>> characters:
>>>
>>>> c <- c('y1","y2","y3","y4","y5")
>>>
>>> Confirm that this is string:
>>>
>>>> is.numeric(c);
>>> [1] FALSE
>>>
>>> cbind c into the data frame:
>>>
>>>> test <- data.frame(cbind(a,b,c))
>>>
>>> Looks like everything is in place:
>>>
>>>> test
>>> a   b  c
>>> 1 1 0.3 y1
>>> 2 2 0.4 y2
>>> 3 3 0.5 y3
>>> 4 4 0.6 y4
>>> 5 5 0.7 y5
>>>
>>> Except that it seems as if the moment I cbind in a character  
>>> vector, it
>>> changes numeric data to string:
>>>
>>>> is.numeric(test$a)
>>> [1] FALSE
>>>> is.numeric(test$b)
>>> [1] FALSE
>>>
>>> which would explain why the operations I'm trying to perform on  
>>> elements of
>>> a and b columns are failing. If I look at the structure of the  
>>> data.frame,
>>> I see that in fact *all* the variables are being entered as  
>>> "factors".
>>>
>>>> str(test)
>>> 'data.frame':   5 obs. of  3 variables:
>>> $ a: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
>>> $ b: Factor w/ 5 levels "0.3","0.4","0.5",..: 1 2 3 4 5
>>> $ c: Factor w/ 5 levels "y1","y2","y3",..: 1 2 3 4 5
>>>
>>> But, if I try
>>>
>>> test <- data.frame(cbind(a,b))
>>>> str(test)
>>> 'data.frame':   5 obs. of  2 variables:
>>> $ a: num  1 2 3 4 5
>>> $ b: num  0.3 0.4 0.5 0.6 0.7
>>>
>>> a and b are coming back as numeric. So, why does cbind'ing a  
>>> column of
>>> character variables change everything else? And, more to the  
>>> point, what do
>>> I need to do to 'correct' the problem (i.e., stop this from  
>>> happening).
>>>
>>>       [[alternative HTML version deleted]]


David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list