[R] Transforming relational data

Matthew Dowle mdowle at mdowle.plus.com
Tue Feb 22 13:44:23 CET 2011


With the new example, what is the full output, and
what do you need instead? Was it correct for the
previous example?

Matthew

"mathijsdevaan" <mathijsdevaan at gmail.com> wrote in message 
news:1298372018181-3318939.post at n4.nabble.com...
>
> Hi Matthew, thanks for your help. There are some things going wrong still.
> Consider this (slightly extended) example:
>
> library(data.table)
> DT = data.table(read.table(textConnection("    A  B  C
> 1 1  a  1999
> 2 1  b  1999
> 3 1  c  1999
> 4 1  d  1999
> 5 2  c  2001
> 6 2  d  2001
> 7 3  a  2004
> 8 3  b  2004
> 9 3  d  2004
> 10 4  c  2001
> 11 4  d  2001"),head=TRUE,stringsAsFactors=FALSE))
> firststep = DT[,cbind(A,expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
> firststep
>      C A Var1 Var2         v
> 1  1999 1    b    a 0.2500000
> 2  1999 1    c    a 0.2500000
> 3  1999 1    d    a 0.2500000
> 4  1999 1    a    b 0.2500000
> 5  1999 1    c    b 0.2500000
> 6  1999 1    d    b 0.2500000
> 7  1999 1    a    c 0.2500000
> 8  1999 1    b    c 0.2500000
> 9  1999 1    d    c 0.2500000
> 10 1999 1    a    d 0.2500000
> 11 1999 1    b    d 0.2500000
> 12 1999 1    c    d 0.2500000
> 13 2001 2    b    a 0.2500000
> 14 2001 4    b    a 0.2500000
> 15 2001 2    a    b 0.2500000
> 16 2001 4    a    b 0.2500000
> 17 2001 2    b    a 0.2500000
> 18 2001 4    b    a 0.2500000
> 19 2001 2    a    b 0.2500000
> 20 2001 4    a    b 0.2500000
> 21 2004 3    b    a 0.3333333
> 22 2004 3    c    a 0.3333333
> 23 2004 3    a    b 0.3333333
> 24 2004 3    c    b 0.3333333
> 25 2004 3    a    c 0.3333333
> 26 2004 3    b    c 0.3333333
>
> Following "firststep", project 2 and 4 involved individuals a and b, while
> actually c and d were involved. It seems that there is something going 
> wrong
> in transforming the data.
>
> Then going to the final result, a list is generated of years and sums of 
> v,
> rather than a list of projects and sums of v. Probably I haven't been 
> clear
> enough: I want to produce a list of all projects and the familiarity of 
> all
> project members involved right before the start of the project.
>
> Example
> project_id  familiarity
> 4  0.25
>
> Members c and d were jointly involved in 3 projects: 1,2,4. Project 4 took
> place in 2001, so only project 1 took place before that (1999 (project 2
> took place in the same year and is therefore not included). The average
> familiarity between the members in project 1 was 1/4, so:
>
> project_id  familiarity
> 4  0.25
>
> Thanks!
>
>
> Matthew Dowle wrote:
>>
>>
>> Thanks for the attempt and required output. How about this?
>>
>> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
>> setkey(firststep,Var1,Var2,C)
>> firststep = firststep[,transform(.SD,cv=cumsum(v)),by=list(Var1,Var2)]
>> setkey(firststep,Var1,Var2,C)
>> DT[, {x=data.table(expand.grid(B,B),C[1]-1L)
>>       firststep[x,roll=TRUE,nomatch=0][,sum(cv)]   # prior familiarity
>>      },by=C]
>>         C  V1
>> [1,] 1999 0.0
>> [2,] 2001 0.5
>> [3,] 2004 2.5
>>
>> I think you may have said you have large data. If so, this
>> method should be fast. Please let us know how you get on.
>>
>> HTH
>> Matthew
>>
>>
>>
>> On Thu, 17 Feb 2011 23:07:19 -0800, mathijsdevaan wrote:
>>
>>> OK, for the last step I have tried this (among other things):
>>> library(data.table)
>>> DT = data.table(read.table(textConnection("    A  B  C 1 1  a  1999
>>> 2 1  b  1999
>>> 3 1  c  1999
>>> 4 1  d  1999
>>> 5 2  c  2001
>>> 6 2  d  2001
>>> 7 3  a  2004
>>> 8 3  b  2004
>>> 9 3  d  2004"),head=TRUE,stringsAsFactors=FALSE))
>>>
>>> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2]
>>> setkey(firststep,Var1,Var2)
>>> list1<-firststep[J(expand.grid(DT$B,DT$B),v=1/length(DT$B)),nomatch=0]
>> [,sum(v)]
>>> list1
>>> #27
>>>
>>> What I would like to get:
>>> list
>>> 1  0
>>> 2  0.5
>>> 3  2.5
>>>
>>> Thanks!
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/Re-Transforming-relational-data-tp3307449p3318939.html
> Sent from the R help mailing list archive at Nabble.com.
>



More information about the R-help mailing list