[R] create group variable -- family data -- for siblings

Gabor Grothendieck ggrothendieck at gmail.com
Sat Oct 25 19:56:03 CEST 2008


Correction and shortening:

f <- function(i) {
  i1 <- if (is.na(famdat[i, 2])) i else match(famdat[i, 2], famdat[1:i, 2])
  i2 <- if (is.na(famdat[i, 3])) i else match(famdat[i, 3], famdat[1:i, 3])
  min(i1, i2)
}
as.numeric(factor(sapply(1:nrow(famdat), f)))


On Sat, Oct 25, 2008 at 1:28 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Here is one other solution. For each row it finds the
> earliest row that has the same momid or popid:
>
>
> f <- function(i) {
>  if (is.na(famdat[i, 1]) || is.na(famdat[i, 2])) {
>          i
>  } else {
>          i1 <- match(famdat[i, 1], famdat[1:i, 1])
>          i2 <- match(famdat[i, 2], famdat[1:i, 2])
>          min(i1, i2)
>  }
> }
> as.numeric(factor(sapply(1:nrow(famdat), f)))
>
>
> On Sat, Oct 25, 2008 at 12:52 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> Create a distance metric which is 0 if there are common mothers or
>> fathers and 1 otherwise using that to cluster your points:
>>
>> dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid, dadid, "!="))
>> dd[is.na(dd)] <- 1
>> hc <- hclust(as.dist(dd))
>> cutree(hc, h = 0.1)
>>
>> On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:
>>> For the following data:
>>>
>>> famdat <- read.table(textConnection("ind momid dadid
>>> 1   18    19
>>> 2   18    19
>>> 3   18    19
>>> 4   21    22
>>> 5   21    22
>>> 6   23    25
>>> 7   23    27
>>> 8   29    30
>>> 9   31    30
>>> 10  40    41
>>> 11  NA    NA
>>> 12  50    51"),header=TRUE)
>>> closeAllConnections();
>>>
>>> I would like to create a label (1,2,3..) for siblings. Siblings will
>>> be defined by those who have both the same momid and dadid, but also
>>> those who
>>> just have the same momid or the same dadid. In addition, there will be
>>> those without siblings and those whose parents are missing, and they
>>> will
>>> get unique ids. For the data above, the result would be:
>>>
>>>   ind momid dadid sibid
>>> 1    1    18    19      1
>>> 2    2    18    19      1
>>> 3    3    18    19      1
>>> 4    4    21    22      2
>>> 5    5    21    22      2
>>> 6    6    23    25      3
>>> 7    7    23    27      3
>>> 8    8    29    30      4
>>> 9    9    31    30      4
>>> 10  10    40    41     5
>>> 11  11    NA    NA   6
>>> 12  12    50    51     7
>>>
>>> Thanks!
>>>
>>> Juliet
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>



More information about the R-help mailing list