[R] cbind, data.frame | numeric to string?

David Winsemius dwinsemius at comcast.net
Tue Apr 10 18:29:30 CEST 2012


On Apr 10, 2012, at 12:19 PM, David Winsemius wrote:

>
> On Apr 10, 2012, at 11:58 AM, Rainer Schuermann wrote:
>
>> cbind() works as well, but only if c is attached to the existing  
>> test variable:
>>
>>> tst <- cbind( test, c )
>>> tst
>>  a    b   c
>> 1  1  0.3  y1
>> 2  2  0.4  y2
>> 3  3  0.5  y3
>> 4  4  0.6  y4
>> 5  5  0.7  y5
>>> str( tst )
>> 'data.frame':   5 obs. of  3 variables:
>> $ a: num  1 2 3 4 5
>> $ b: num  0.3 0.4 0.5 0.6 0.7
>> $ c: Factor w/ 5 levels "y1","y2","y3",..: 1 2 3 4 5
>>
>> Not saying it is a good idea, though...
>
> To be somewhat more expansive ... 'cbind' is not just one function,  
> but rather a set of functions, since it is "generic". The one that  
> is chosen by the interpreter will depend on whether the first  
> argument has a class.

That was just my erroneous impression. If _any_ of the objects in the  
argument list is a data.frame then cbind.data.frame appears to get  
used. There is a Dispatch section on the help page for cbind that  
appears to cover this adequately.

> If it does have a class as in the example above having a class of  
> "data.frame", then the cbind.data.frame function will be dispatched  
> to process the list of arguments. If the first argument doesn't have  
> a class as in the OP's second example below, then the internal cbind  
> function will be used and returns a matrics which strips off all but  
> a few attributes and forces a lowest common denominator mode. If  
> only one of the arguments were logical, then cbind would return a a  
> matrix of all TRUEs and FALSEs.
>
> (This all assumes that the typos in the OP's original example that  
> created 'c' as an incomplete expression and a and b with unequal  
> lengths were fixed.)
>
> > a <- c(1,2,3,4,5);
> > b <- c(0.3,0.4,0.5,0,6,0.7);
> > test <- data.frame(cbind(a,b))
> Warning message:
> In cbind(a, b) :
>  number of rows of result is not a multiple of vector length (arg 1)
> > c <- c("y1","y2","y3","y4","y5")
> > cbind(c, test)
> Error in data.frame(..., check.names = FALSE) :
>  arguments imply differing number of rows: 5, 6
> -- 
> David.
>
>
>>
>> Rainer
>>
>>
>> On Tuesday 10 April 2012 11:38:51 R. Michael Weylandt wrote:
>>> Don't use cbind() -- it forces everything into a single type, here
>>> string, which in turn becomes factor.
>>>
>>> Simply,
>>>
>>> data.frame(a, b, c)
>>>
>>> Like David mentioned a few days ago, I have no idea who is promoting
>>> this data.frame(cbind(...)) idiom, but it's a terrible idea (albeit
>>> one that seems to be very frequent over the last few weeks)
>>>
>>> Michael
>>>
>>> On Tue, Apr 10, 2012 at 10:33 AM, Anser Chen  
>>> <anser.chen at gmail.com> wrote:
>>>> Complete newbie to R -- struggling with something which should be  
>>>> pretty
>>>> basic. Trying to create a simple data set (which I gather R  
>>>> refers to as a
>>>> data.frame). So
>>>>
>>>>> a <- c(1,2,3,4,5);
>>>>> b <- c(0.3,0.4,0.5,0,6,0.7);
>>>>
>>>> Stick the two together into a data frame (call test) using cbind
>>>>
>>>>> test <- data.frame(cbind(a,b))
>>>>
>>>> Seems to do the trick:
>>>>
>>>>> test
>>>> a   b
>>>> 1 1 0.3
>>>> 2 2 0.4
>>>> 3 3 0.5
>>>> 4 4 0.6
>>>> 5 5 0.7
>>>>>
>>>>
>>>> Confirm that each variable is numeric:
>>>>
>>>>> is.numeric(test$a)
>>>> [1] TRUE
>>>>> is.numeric(test$b)
>>>> [1] TRUE
>>>>
>>>>
>>>> OK, so far so good. But, now I want to merge in a vector of  
>>>> characters:
>>>>
>>>>> c <- c('y1","y2","y3","y4","y5")
>>>>
>>>> Confirm that this is string:
>>>>
>>>>> is.numeric(c);
>>>> [1] FALSE
>>>>
>>>> cbind c into the data frame:
>>>>
>>>>> test <- data.frame(cbind(a,b,c))
>>>>
>>>> Looks like everything is in place:
>>>>
>>>>> test
>>>> a   b  c
>>>> 1 1 0.3 y1
>>>> 2 2 0.4 y2
>>>> 3 3 0.5 y3
>>>> 4 4 0.6 y4
>>>> 5 5 0.7 y5
>>>>
>>>> Except that it seems as if the moment I cbind in a character  
>>>> vector, it
>>>> changes numeric data to string:
>>>>
>>>>> is.numeric(test$a)
>>>> [1] FALSE
>>>>> is.numeric(test$b)
>>>> [1] FALSE
>>>>
>>>> which would explain why the operations I'm trying to perform on  
>>>> elements of
>>>> a and b columns are failing. If I look at the structure of the  
>>>> data.frame,
>>>> I see that in fact *all* the variables are being entered as  
>>>> "factors".
>>>>
>>>>> str(test)
>>>> 'data.frame':   5 obs. of  3 variables:
>>>> $ a: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5
>>>> $ b: Factor w/ 5 levels "0.3","0.4","0.5",..: 1 2 3 4 5
>>>> $ c: Factor w/ 5 levels "y1","y2","y3",..: 1 2 3 4 5
>>>>
>>>> But, if I try
>>>>
>>>> test <- data.frame(cbind(a,b))
>>>>> str(test)
>>>> 'data.frame':   5 obs. of  2 variables:
>>>> $ a: num  1 2 3 4 5
>>>> $ b: num  0.3 0.4 0.5 0.6 0.7
>>>>
>>>> a and b are coming back as numeric. So, why does cbind'ing a  
>>>> column of
>>>> character variables change everything else? And, more to the  
>>>> point, what do
>>>> I need to do to 'correct' the problem (i.e., stop this from  
>>>> happening).
>>>>
>>>>      [[alternative HTML version deleted]]
>
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list