Bogdan Tanasa
Wed Jul 25 19:58:28 CEST 2018
Dear Jeff, it is a precious help and a fabulous suggestion. I will slowly
go over the R code that you have sent. Thanks a lot !
On Wed, Jul 25, 2018 at 10:43 AM, Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
wrote:
> The code below reeks of a misconception that lists are efficient to add
> items to, which is a confusion with the computer science term "linked
> list". In R, a list is NOT a linked list... it is a vector, which means
> the memory used by the list is allocated at the time it is created, and
> REALLOCATED when a new item is added. The only reason you should use a list
> is because you expect to put values of different types or shapes into it,
> which does not appear to apply in this use case.
>
> In R, you should make a valiant effort to create things right the first
> time, and if that doesn't work then preallocate the space you will need in
> the vectors you are working with. Since you have a need to store a variable
> number of elements in each intersectX element, the column needs to be a
> list but the elements of that list can perfectly well be character vectors.
>
> x <- data.frame( TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA")
> , CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2")
> , POSA=c(10, 15, 120, 340, 100, 220)
> , CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1")
> , POSB=c(30, 100, 300, 20, 200, 320)
> , stringsAsFactors = FALSE
> )
> compareRng <- function( chr1, pos1, chr2, pos2, delta ) {
> ( chr1 == chr2
> & ( pos2 - delta ) < pos1
> & pos1 < ( pos2 + delta )
> )
> }
> makeIntersectX <- function( n, chrlabel, poslabel, delta ) {
> lgclidx <- rep( TRUE, nrow( x ) )
> lgclidx[ n ] <- FALSE
> x[[ chrlabel ]][ compareRng( x[[ chrlabel ]][ n ]
> , x[[ poslabel ]][ n ]
> , x[[ chrlabel ]]
> , x[[ poslabel ]]
> , delta
> )
> & lgclidx
> ]
> }
>
> x$intersectA <- lapply( seq.int( nrow( x ) )
> , makeIntersectX
> , chrlabel = "CHRA"
> , poslabel = "POSA"
> , delta = 10L
> )
> x$intersectB <- lapply( seq.int( nrow( x ) )
> , makeIntersectX
> , chrlabel = "CHRB"
> , poslabel = "POSB"
> , delta = 21L
> )
>
>> x
>>
> TYPE CHRA POSA CHRB POSB intersectA intersectB
> 1 DEL chr1 10 chr1 30 chr1
> 2 DEL chr1 15 chr1 100 chr1
> 3 DUP chr1 120 chr1 300 chr1
> 4 TRA chr1 340 chr2 20
> 5 INV chr2 100 chr2 200
> 6 TRA chr2 220 chr1 320 chr1
>
> Note that depending on what you plan to do beyond this point, it might
> actually be more performant to use a data frame with repeated rows instead
> of list columns... but I cannot tell from what you have provided.
>
>
> On Wed, 25 Jul 2018, Bogdan Tanasa wrote:
>
> Dear Thierry and Juan, thank you for your help. Thank you all.
>>
>> Now, if I would like to add an element to the empty list, how shall I do :
>> for example, shall i = 2, and j = 1, in a bit of more complex R code :
>>
>> x <- data.frame(TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA"),
>> CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2"),
>> POSA=c(10, 15, 120, 340, 100, 220),
>> CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1"),
>> POSB=c(30, 100, 300, 20, 200, 320))
>>
>> x$labA <- paste(x$CHRA, x$POSA, sep="_")
>> x$labB <- paste(x$CHRB, x$POSB, sep="_")
>>
>> x$POSA_left <- x$POSA - 10
>> x$POSA_right <- x$POSA + 10
>>
>> x$POSB_left <- x$POSB - 10
>> x$POSB_right <- x$POSB + 10
>>
>> x$intersectA <- rep(list(list()), nrow(x))
>> x$intersectB <- rep(list(list()), nrow(x))
>>
>> And we know that for i = 2, and j = 1, the condition is TRUE :
>>
>> i <- 2
>>
>> j <- 1
>>
>> if ( (x$CHRA[i] == x$CHRA[j] ) &&
>> (x$POSA[i] > x$POSA_left[j] ) &&
>> (x$POSA[i] < x$POSA_right[j] ) ){
>> x$intersectA[i] <- c(x$intersectA[i], x$labA[j])}
>>
>> the R code does not work. Thank you for your kind help !
>>
>>>
>>>
>>> 2018-07-25 8:55 GMT+02:00 Bogdan Tanasa <tanasa using gmail.com>:
>>>
>>> Dear all,
>>>>
>>>> assuming that I do have a dataframe like :
>>>>
>>>> x <- data.frame(TYPE=c("DEL", "DEL", "DUP", "TRA", "INV", "TRA"),
>>>> CHRA=c("chr1", "chr1", "chr1", "chr1", "chr2", "chr2"),
>>>> POSA=c(10, 15, 120, 340, 100, 220),
>>>> CHRB=c("chr1", "chr1", "chr1", "chr2", "chr2", "chr1"),
>>>> POSB=c(30, 100, 300, 20, 200, 320)) ,
>>>>
>>>> how could I initiate another 2 columns in x, where each element in
>>>> these 2
>>>> columns is going to be a list (the list could be updated later). Thank
>>>> you !
>>>>
>>>> Shall I do,
>>>>
>>>> for (i in 1:dim(x)[1]) { x$intersectA[i] <- list()}
>>>>
>>>> for (i in 1:dim(x)[1]) { x$intersectB[i] <- list()}
>>>>
>>>> nothing is happening. Thank you very much !
>>>>
>>>>
>>>
>>>
>>
