[R] pairing data using combn with criteria

David Winsemius dwinsemius at comcast.net
Sun Nov 18 02:32:48 CET 2012


On Nov 17, 2012, at 4:05 PM, arun wrote:

> HI,
>
>
> If the order of individuals are changed or if some individuals are  
> missing, this method may need modification.
>
> ind <- c('1','3','4','8')
>   fam <- c('1','2','1','2')
>   dat1 <- data.frame(ind,fam)
>  combn( row.names(dat1), 2, FUN = function(b){
>                  if (dat1[b[1], "fam" ] != dat1[b[2], "fam"] ) { b }  
> else { c(NA,NA) } } )
> #     [,1] [,2] [,3] [,4] [,5] [,6]
> #[1,] "1"  NA   "1"  "2"  NA   "3"
> #[2,] "2"  NA   "4"  "3"  NA   "4"
>
>  row.names(dat1)<-dat1$ind
>   combn( row.names(dat1), 2, FUN = function(b){
>                   if (dat1[b[1], "fam" ] != dat1[b[2], "fam"] )  
> { b } else { c(NA,NA) } } )
> #     [,1] [,2] [,3] [,4] [,5] [,6]
> #[1,] "1"  NA   "1"  "3"  NA   "4"
> #[2,] "3"  NA   "8"  "4"  NA   "8"

If you wanted to account for a situation where the row names do not  
match the 'ind' names, then you could instead use something like this:

combn( row.names(dat1), 2, FUN = function(b){
                   if (dat1[b[1], "fam" ] != dat1[b[2], "fam"] ) {
                                       c( dat1[ b[1] ],  
"ind"],dat1[ b[2] ], "ind"])
                              } else { c(NA,NA) }

This should also allow pairs of persons (or chicks or piglets)  with  
the same name (or id)  but from different families to be paired. The  
assignment of duplicate names to row.names might produce some  
surprises in that instance.

-- 
David.

>
>
> A.K.
> ----- Original Message -----
> From: David Winsemius <dwinsemius at comcast.net>
> To: benjamin_jarrett <bjmjarrett at gmail.com>
> Cc: r-help at r-project.org
> Sent: Saturday, November 17, 2012 6:36 PM
> Subject: Re: [R] pairing data using combn with criteria
>
>
> On Nov 17, 2012, at 10:07 AM, benjamin_jarrett wrote:
>
>> Hi David,
>>
>> Thanks for replying. Unfortunately I can't get it to work. Here is  
>> some
>> (very simplified) data to help illustrate my problem.
>>
>> ind <- c('1','2','3','4')
>> fam <- c('1','2','1','2')
>> data <- data.frame(ind,fam)
>>
>> ind is the unique ID for each individual, and fam is which family the
>> individual came from. Using combn(ind, 2) matches all of the  
>> individuals. Is
>> there any way I could get combn to pair individuals up based on a  
>> different
>> family number, so with the above data individual 1 would be paired  
>> with
>> individual 2 or 4.
>
> Please include context (and _do_ read the Posting Guide.) This is  
> the suggestion I made before:
>
>> See if this helps:
>>
>> combn( 1:5, 2, FUN = function(b){
>>                  if (max (b) < 4 ) { b } else { c(NA,NA) } } )
>>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>> [1,]    1    1   NA   NA    2   NA   NA   NA   NA    NA
>> [2,]    2    3   NA   NA    3   NA   NA   NA   NA    NA
>
>
> And this is how to apply it to the example:
>
> combn( row.names(data), 2, FUN = function(b){
>                  if (data[b[1], "fam" ] != data[b[2], "fam"] ) { b }  
> else { c(NA,NA) } } )
>
>      [,1] [,2] [,3] [,4] [,5] [,6]
> [1,] "1"  NA   "1"  "2"  NA   "3"
> [2,] "2"  NA   "4"  "3"  NA   "4"
>
>
> -- 
>

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list