[R] coalesce columns within a data frame

Duncan Murdoch murdoch at stats.uwo.ca
Wed Oct 22 19:01:43 CEST 2008


On 10/22/2008 12:09 PM, Ivan Alves wrote:
> Dear all,
> Thanks for all the replies.
> I get something with Duncan's code (slightly more compact than the  
> other two), but of class "integer", whereas the two inputs are class  
> "factor".  Clearly the name information is lost.  I did not see  
> anything on this in the help page for ifelse.

It is there, in this warning:

      The mode of the result may depend on the value of 'test', and the
      class attribute of the result is taken from 'test' and may be
      inappropriate for the values selected from 'yes' and 'no'.

You'd want the result to be a factor, but those attributes are lost.  I 
think this is a result of two design flaws:  ifelse() shouldn't base the 
class on the test, it should base it on the values.  And factors in S 
and R have all sorts of problems.

You can work around this by converting to character vectors:

Name <- ifelse(is.na(Name.x), as.character(Name.y), as.character(Name.x))

If you really want factors, you can convert back at the end, but why 
would you want to?

Duncan Murdoch

> 
> On this experience I also tried
> df$Name <- df$NAME.x
> df[is.na(df$NAME.x),"Name"] <- df[is.na(df $NAME.x),"NAME.y"]
> 
> but then again the "factor" issue was a problem (clearly the levels  
> are not the same and then there is a conflict)
> 
> Any further guidance?
> Kind regards,
> Ivan
> 
> On 22 Oct 2008, at 17:26, Duncan Murdoch wrote:
> 
>> On 10/22/2008 11:21 AM, Ivan Alves wrote:
>>> Dear all,
>>> I searched the mail archives and the R site and found no guidance   
>>> (tried "merge", "cbind" and terms like "coalesce" with no  
>>> success).   There surely is a way to coalesce (like in SQL) columns  
>>> in a  dataframe, right?  For example, I would like to go from a  
>>> dataframe  with two columns to one with only one as follows:
>>> From
>>> Name.x Name.y
>>> nx1 ny1
>>> nx2 NA
>>> NA ny3
>>> NA NA
>>> ...
>>> To
>>> Name
>>> nx1
>>> nx2
>>> ny3
>>> NA
>>> ...
>>> where column Name.x is taken if there is a value, and if not then   
>>> column Name.y
>>> Any help would be appreciated
>>
>> I don't know of any special function to do that, but ifelse() can  
>> handle it easily:
>>
>> Name <- ifelse(is.na(Name.x), Name.y, Name.x)
>>
>> (If those are columns of a dataframe named df, you'd prefix each  
>> column name by df$, or do
>>
>> within(df, Name <- ifelse(is.na(Name.x), Name.y, Name.x))
>>
>> Duncan Murdoch
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list