# [R] create group variable -- family data -- for siblings

Gabor Grothendieck ggrothendieck at gmail.com
Sat Oct 25 19:28:19 CEST 2008

```Here is one other solution. For each row it finds the
earliest row that has the same momid or popid:

f <- function(i) {
if (is.na(famdat[i, 1]) || is.na(famdat[i, 2])) {
i
} else {
i1 <- match(famdat[i, 1], famdat[1:i, 1])
i2 <- match(famdat[i, 2], famdat[1:i, 2])
min(i1, i2)
}
}
as.numeric(factor(sapply(1:nrow(famdat), f)))

On Sat, Oct 25, 2008 at 12:52 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Create a distance metric which is 0 if there are common mothers or
> fathers and 1 otherwise using that to cluster your points:
>
> dd[is.na(dd)] <- 1
> hc <- hclust(as.dist(dd))
> cutree(hc, h = 0.1)
>
> On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
>> For the following data:
>>
>> 1   18    19
>> 2   18    19
>> 3   18    19
>> 4   21    22
>> 5   21    22
>> 6   23    25
>> 7   23    27
>> 8   29    30
>> 9   31    30
>> 10  40    41
>> 11  NA    NA
>> closeAllConnections();
>>
>> I would like to create a label (1,2,3..) for siblings. Siblings will
>> be defined by those who have both the same momid and dadid, but also
>> those who
>> just have the same momid or the same dadid. In addition, there will be
>> those without siblings and those whose parents are missing, and they
>> will
>> get unique ids. For the data above, the result would be:
>>
>> 1    1    18    19      1
>> 2    2    18    19      1
>> 3    3    18    19      1
>> 4    4    21    22      2
>> 5    5    21    22      2
>> 6    6    23    25      3
>> 7    7    23    27      3
>> 8    8    29    30      4
>> 9    9    31    30      4
>> 10  10    40    41     5
>> 11  11    NA    NA   6
>> 12  12    50    51     7
>>
>> Thanks!
>>
>> Juliet
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help